I need to remove numbers from a string while processing some data as part of an ETL process in a data warehouse. I have to do this for many columns and potentially many tables so I decided to put the logic into a Scalar-Valued Function (SVF), understanding there are performance detriments when using SVFs. Generally this will be used on relatively small data sets for cleaning “codes” that will be insert into a slowly changing dimension. I was curious about how to do this. My instinct was to do it using nested replace functions. I did a quick google search to see what options the collective would recommend. I came across a post on Stack Overflow that mentioned the nested replace functions, and then someone also mentioned using a recursive Common Table Expression (CTE), which I thought was a creative suggestion.
https://stackoverflow.com/questions/13240298/remove-numbers-from-string-sql-server/
I have a soft spot in my heart for recursive CTEs so I thought I’d try both options and see which solution performs better. Let’s start with building the functions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
|
--create Nested function create function udf_RemoveNumbers_Nested ( -- Add the parameters for the function here @String varchar(200) ) returns varchar(200) as begin -- Declare the return variable here declare @Result varchar(200); -- Add the T-SQL statements to compute the return value here select @Result = replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( @String, '0', '' ) , '1', '' ) , '2', '' ) , '3', '' ) , '4', '' ) , '5', '' ) , '6', '' ) , '7', '' ) , '8', '' ) , '9', '' ); -- Return the result of the function return @Result; end; go --creat CTE function alter function udf_RemoveNumbers_CTE ( -- Add the parameters for the function here @String varchar(200) ) returns varchar(200) as begin -- Declare the return variable here declare @Result varchar(200); -- Add the T-SQL statements to compute the return value here ; with cte as ( select @String as string , 0 as n union all select convert( varchar(200), replace( string, n, '' )) , n + 1 from cte where n <= 9 ) select @Result = ( select top 1 string from cte order by n desc ); -- Return the result of the function return @Result; end; go |
Now that we have some functions we need some test data so let’s create a simple table and populate it with simi-random data by leveraging the newid() function.
Continue reading Remove Numbers from Text SVF, Nested vs CTE →