Category Archives: MS SQL Server 2008 R2

Performance of Like vs Left/Right for Indexed and Non Indexed Columns

I have have heard and read many times over the years that Left or Right should always be used, where possible, instead of Like.  On the surface, this does sound logical since the Like command is often used for pattern searching and one could assume that the Like command is causing some extra work to be performed by the SQL server engine even though the predicate expressions are essentially equivalent. I personally have always assumed that they operated equivalently, however, I had never performed a performance analysis to infer the true answer.  Also, was curious to find if there was a difference in performance when applied specifically to an indexed column vs non-indexed column.

The names and data structures in this example have been changed to protect the innocent. For this test, assume [SOME_TABLE] has 530 million records and contains two columns: [INDEXED_COLUMN], and [NONINDEXED_COLUMN].  Indexed_Column has a clustered index and Nonindexed_Column does not have an index.  We will run a test using like and left against each column using a basic count(*) command.

Test 1 – Non Indexed Column

Output for queries 1 and 2 = 63 million.

  • Query 1 used like and  averaged 102 seconds over 5 measurements.
  • Query 2 used left and averaged 103 seconds over 5 measurements.

Execution plans for queries 3 and 4:like vs left no index

Test 2 – Indexed Column

Output for queries 1 and 2 = 87 million.

  • Query 1 used like and averaged 3 seconds over 5 measurements.
  • Query 2 used left and averaged 14 seconds over 5 measurements.

Execution plans for queries 1 and 2:like vs left index

Conclusion

You can see from the results that Like is actually considerably faster than Left when applied to an indexed column.  This is because Like is treated as a computed column, which can utilize indexes, while Left is a function which can not utilize indexes.  If an index is not involved then Like and Left perform the same.

There might be additional factors that impact performance differences which I am not considering such as other index types, and other data types.  I came across another blog (http://cc.davelozinski.com/sql/like-vs-substring-vs-leftright-vs-charindex) where a similar test was done.  Surprisingly, their results showed that Chardindex is the most efficient out of Like, Substring, Left/Right, and Charindex, and their results between Like and Left/Right are quite different than mine.  At some point I’d like to go through their code and identify exactly what is different.

Edit: The author reached out to me and mentioned that the main difference between our results is likely due to my analysis being ran on SQL Server 2008 R2 vs theirs being done on SQL Server 2014 and there have been changes to the query engine between versions.  I encourage anyone seeing this to check out Dave’s blog.

More about Indexes on Computed Columns: https://msdn.microsoft.com/en-us/library/ms189292.aspx

Check SQL Server Version Using T-SQL

If you are building some code that would be different depending on the version of SQL Server then it may be useful to check the SQL server version using T-SQL instead of going through the object explorer in SSMS.  For example, I have some code that checks for SSIS package execution status across multiple servers and the database design is slightly different between server versions so I need to check the version and then execute the proper block of code.

 

Helpful SQL Server Links

icrosoft White Papers:
https://msdn.microsoft.com/en-us/library/ee410014(v=sql.105).aspx

Microsoft Books Online:
https://technet.microsoft.com/en-us/library/ms130214(v=sql.105).aspx

Pragmatic Works Training Resources:
http://pragmaticworks.com/Training/Resources

SQL Server Database Engine Permission Posters:
http://social.technet.microsoft.com/wiki/contents/articles/11842.sql-server-database-engine-permission-posters.aspx

TechFunda SQL Server Tutorials:
http://techfunda.com/howto/sql-server

Microsoft SQL Server Tutorials:
https://msdn.microsoft.com/en-us/library/ms167593

Microsoft SQL Server Community Projects & Samples:
http://sqlserversamples.codeplex.com/

SQL Pass DBA Fundamentals – Web Presentation Archives
http://fundamentals.sqlpass.org/MeetingArchive.aspx

Blogs and Other Great Resources:
https://blogs.msdn.microsoft.com/sql_server_team/ – MSSQL Tiger Team
https://www.mssqltips.com/
http://blog.sqlauthority.com/
http://www.brentozar.com/
http://www.edwinmsarmiento.com/
http://kevinsgoff.net/
https://www.sqlskills.com/
http://blog.datainspirations.com/
 – BI blog by Stacia

A better list put together by New Horizons:
http://nhlearningsolutions.com/Blog/TabId/145/ArtMID/16483/ArticleID/1186/Free-Resources-for-SQL-Server.aspx

SQL Injections:
http://www.unixwiz.net/techtips/sql-injection.html

Show SP_Who and Rollback Status for All SPIDS Rolling Back

We had a session we had to kill and roll back and wanted some more information about the status of the roll back.  If you only have one SPID to check then you can run the kill commend with an option:

I wanted to expand this by dynamically looking for all sessions in rollback status and then provide some information about the session as well as the kill status.  This way, I do not need to first manually review sessions and look for records in rollback status, then check each kill status.

In my test there was only one SPID in rollback status but it will work if there are more than one.

List SQL Servers On Network

A list of servers can be obtained using windows command prompt (cmd) or powershell (ps).  CMD and PS both provide Server Name and Instance Name, however, powershell provides two additional fields: IsClustered, and SQL Server Version.  One challenge I have noticed is that running these commands from my local machine only returns a short list with names that look like computer names and not server names.  I had to run the commands from a production server, a QA server, and a development server to obtain the server lists from those environments.  There must be some network separation between them, which I am still investigating.

CMD: OSQL -L or SQLCMD -L

PS: [System.Data.Sql.SqlDataSourceEnumerator]::Instance.GetDataSources()

Check if SQL Server is Clustered

For more information on SERVERPROPERTY(): https://msdn.microsoft.com/en-us/library/ms174396.aspx

Search for Column Name in All Tables in All Databases

Use a text wild card to search for a column name in all tables in each database.  Stored procedure sp_msforeachdb is used to cycle through each database.  The sys tables are used to obtain table and column information.  The output will include Database Name, Schema Name, Table Name, Column Name, and code to easily select the top 10 from a table returned in the result.

Random n% Records – Fastest Method

Method Source: https://msdn.microsoft.com/en-us/library/cc441928.aspx

Row Count on Large Tables Fast

There are ~550 million records in a table I am trying to get a row count on.  Typically I have always used the “COUNT(*)” method to get a row count. I was curious if there is a faster way.  I had read in some forums that “COUNT(1)” or “COUNT(Primary_Key)” could be theoretically faster but there was some argument on the topic.  I decided to run execution plans on all three to see how the query engine is handling it.  According to the execution plan all three operations are treated the same.

row count execution plan

I decided to then prove they are treated the same by running each one three times and taking the average execution time.  Although the results show small differences , this only caused by fluctuations on server load from other processes.   If I took enough measurements they would  converge.

Results:
COUNT(*) = 15.0 seconds
COUNT(1) = 16.3 seconds
COUNT(PK) = 18.0 seconds

Change in Direction

For my purposes I am trying to quickly obtain a count for all tables and databases on a server.  This a large production data warehouse with many databases and hundreds of tables.  I want to minimize the impact to the server and in this case that is more important to me than accuracy.  An alternative method to reading the table directly is to get the statistics from sys.dm_db_partition_stats. This is a Dynamic Management View (DMV) that reads the information about the partition, and by summing it up for a given object you can obtain the row count.  This row count is an estimation.  Any transactions that are occurring would not be captured.

dm_db_partition_stats execution plan

Althought the execution plan looks more complicated, it does not have to read a large table and instead only reads 1 or a handful of records via the system DMV.

Considerations

To stress the point; this is a fast but not necessarily accurate method to get row count.  If you are doing some validation, such as comparing tables, then I recommend using count() and possible checksum().

More Information About dm_db_partition_stats: https://msdn.microsoft.com/en-us/library/ms187737.aspx