Remove Numbers from Text SVF, Nested vs CTE

I need to remove numbers from a string while processing some data  as part of an ETL process in a data warehouse.  I have to do this for many columns and potentially many tables so I decided to put the logic into a Scalar-Valued Function (SVF), understanding there are performance detriments when using SVFs. Generally this will be used on relatively small data sets for cleaning “codes” that will be insert into a slowly changing dimension.  I was curious about how to do this.  My instinct was to do it using nested replace functions.  I did a quick google search to see what options the collective would recommend.  I came across a post on Stack Overflow that mentioned the nested replace functions, and then someone also mentioned using a recursive Common Table Expression (CTE), which I thought was a creative suggestion.

https://stackoverflow.com/questions/13240298/remove-numbers-from-string-sql-server/

I have a soft spot in my heart for recursive CTEs so I thought I’d try both options and see which solution performs better.  Let’s start with building the functions:

Now that we have some functions we need some test data so let’s create a simple table and populate it with simi-random data by leveraging the newid() function.

Run the test by calling each function against the test table.  I made the calls as separate batches to the server so that I can grab some execution query statistics afterwards.  I ran the test 10 times to get a satisfactory sample set for the performance test.

I measured performance by utilizing the execution query statistics.  You can not easily do this any other way.  The execution plan and IO statistics for scalar value functions are nearly useless.

Here is the result.  As you can see, the nested replace function performs much better than the CTE.  I hypothesized this would be the case, but it’s nice to be able to prove it so you can feel comfortable moving forward with a solution.

TypeAvg ReadsAvg Workers Time (ms)Avg Elapsed Time (ms)
CTE102,010681746
Nested92525

To learn more about performance of user defined functions (UDFs) in general, see this great post on mssqltips:
https://www.mssqltips.com/sqlservertip/4689/understand-the-performance-behavior-of-sql-server-scalar-user-defined-functions/

Leave a Reply

Your email address will not be published. Required fields are marked *