Remove Numbers from Text SVF, Nested vs CTE

I need to remove numbers from a string while processing some data  as part of an ETL process in a data warehouse.  I have to do this for many columns and potentially many tables so I decided to put the logic into a Scalar-Valued Function (SVF), understanding there are performance detriments when using SVFs. Generally this will be used on relatively small data sets for cleaning “codes” that will be insert into a slowly changing dimension.  I was curious about how to do this.  My instinct was to do it using nested replace functions.  I did a quick google search to see what options the collective would recommend.  I came across a post on Stack Overflow that mentioned the nested replace functions, and then someone also mentioned using a recursive Common Table Expression (CTE), which I thought was a creative suggestion.

https://stackoverflow.com/questions/13240298/remove-numbers-from-string-sql-server/

I have a soft spot in my heart for recursive CTEs so I thought I’d try both options and see which solution performs better.  Let’s start with building the functions:

Now that we have some functions we need some test data so let’s create a simple table and populate it with simi-random data by leveraging the newid() function.

Continue reading Remove Numbers from Text SVF, Nested vs CTE

Two

I don’t sleep, but I suddenly wake
My head is a race track
Your toy car follows each age line in my face
I smile
New lines are formed

Today you are two

Height, weight, and age can be measured
You can not measure my patience
But you know how to test it
Bip the Clown is no mute
A handsome Blok nonetheless
You can not measure my happiness
But you know how to grant it

Preparing for work
Drab as a fool, aloof as a bard
I daydream we are an adagio pair
And you are Atlantis soaring
Toothpaste drips onto my shirt

Stomping
Here comes the son
Winked an eye as you pointed your finger…
“I pooped Dada!”

Soon there will be two

Continue reading Two

Simon Sinek: The Video That Will Change Your Life

My boss had our department watch this today to give us some perspective on personal and workplace relationships.  Specifically, to give us some perspective when trying to open the minds of our co-workers to business intelligence.  I tend to hate “motivational” or “self help” types of videos but this one has a science twist that really spoke to me.

About this presentation:
In this in-depth talk, ethnographer and leadership expert Simon Sinek reveals the hidden dynamics that inspire leadership and trust. In biological terms, leaders get the first pick of food and other spoils, but at a cost. When danger is present, the group expects the leader to mitigate all threats even at the expense of their personal well-being. Understanding this deep-seated expectation is the key difference between someone who is just an “authority” versus a true “leader.”

Introducing Google A.I. Experiments

Explore machine learning by playing with pictures, language, music, code, and more.

Check it out: https://aiexperiments.withgoogle.com/

Here is an inspiring TED talk on AI that has more of a positive spin than is often given to the future prospects of artificial intelligence and how it will fit in with society.  One of the main points is “Making people want stuff we make” vs “Making stuff people want” by using AI to give us the insight we would not otherwise realize.

Backup Transaction Logs to Nul for All Online Databases

First of all, never use this in a production environment!   This script is to backup your transaction logs to a “nul device” for all online databases which are not using simple recovery model.  In windows it is indeed spelled “nul” with one “L”.  The only reason you would want to do this is if you have a non production environment using full recovery model and this server architecturally mirrors your production environment.  For example, we have a “staging” server that is used for testing our code changes before they go into production.  We require the staging environment to be as close to production as possible and have scheduled scripts that sync them weekly.  In this scenario, we have many databases in the staging server that are using full recovery model but we do not want to backup the t-logs, we would rather just throw them away.

To learn more about NUL devices, here’s a link to the wikipedia page: https://en.wikipedia.org/wiki/Null_device

Resolve SQL Server Fatal error: cannot create ‘R_TempDir’

I was trying to test an install of R in SQL Server 2016 and when running a script I received this error: Fatal error: cannot create ‘R_TempDir’

Following the instructions here, I enabled external scripts, restarted the sql server service, and then tried to run the following test script:

This is when the fatal error occurred.  As the error suggests, R is having some issues creating a temporary directory.  After some internet searching and trial and error I got past the issue.

Enable  8dot3 File Names

R configuration uses the 8dot3 file name convention, also known as “short names”.  To enable this on windows 10, run the following command in CMD (command prompt):

For more options and information look here: https://support.microsoft.com/en-us/kb/121007

Give access to the working directory to R

Locate and open “rlauncher.config” file in a text editor.  This file will be under the “<sqlserver_instance>\binn” directory.  Take a look at the location of WORKING_DIRECTORY.  This should have a “short name” file path.  The path should be something like “<sqlserver_instance>\EXTENS~1”, and “\EXTENS~1” is equivalent to “\ExtensibilityData”.  We need to give access to R to this folder. I did this by changing the permissions to full control to everyone.  You may want to be more restrictive here, but in my case this did not matter.

  • Right click folder > Properties > Security tab > Advanced > Add
  • Select a principal (I entered “Everyone”)
  • Tick “Full control” under basic permissions and click “OK”
  • Tick “Replace all child object and permissions entries with inheritable permissions entries from this object” and click “OK”

Now if you rerun the script above you should get a result of “hello, 1”.

Calculate Weighted Average in Excel using SumProduct() Function

Weighted averaged, also known as weighted arithmetic mean, is similar to an ordinary average, except that instead of each of the data points contributing equally to the final average, each data point is “weighted” and thus contributes more or less depending on the given weight.  The weight would typically be some correlated data point that indicates significance of the value being averaged.

For example, let’s say we were tracking the progress of a project and its various tasks.  Our data set includes the task number, percent completed, and estimated hours to complete the task.  We want to calculate an overall percent completed for the project based on how complete the individual tasks are.  Take a look at the example below.

weightedavg00

=AVERAGE(B2:B6)

If we calculate a straight average of the percent complete column then we get 75% completed overall.  However, this could be deceiving because some tasks will take longer to complete than others, as indicated in the estimated hours column.  Let’s use this estimated hours column as the “weight” in our weighted average.

One way to do this would be to:

  • multiply the item (% Complete) by the weight (Estimated Hours) at the row level, shown in column D below.
  • sum up all of those products in column D, and then
  • divide by the sum of the weight for all records (column C)

Formulas:
weightedavg01_formulas

Results:
weightedavg01

The result of the weighted average is 49%, which is much different than the 75% from a straight average.  This is because there are items which have a high percent completed but a low estimated hours to complete, and also items with low percent completed and higher estimated hours to complete.  By including a weight to factor in the level of effort for each item you get a much more accurate result.

What’s the problem with this approach? If the number of tasks changes then it becomes a fairly manual task to adjust the rows and formulas accordingly.  Also, I don’t like the idea of having a “helper” or “work” column inserted into the data set.  There is a quicker and simpler way to calculate the weighted average than the method I just explained.

Continue reading Calculate Weighted Average in Excel using SumProduct() Function

An Information Archive for a Frivolous Human