SQL CASE and R Paste

August 09, 2017

Reading time ~7 minutes

SQL Case

The SQL `CASE` function is one of my favorite. The command basically works like if-then command. If you are familiar with if-then commands, then feel free to skip this next bit.

If-Then

One of the reasons we have the amazing devices we do today is because a computer is capable of reasoning. A computer can compare two things and decide which one it likes.

Now, this may sound simple, but it’s actually a subtle miracle. Anyone who has been stuck on the toothpaste isle trying to decide between the 45 kinds of toothpaste probably understands making decisions is difficult. Of course, human decision making and computer decision making are not even on the same level. Humans can make comparisons of all 45 products at once(sort of). Computers, they have to make a decision between two objects, then, two objects, then two objects, so forth, until it has made it through all 45. Fortunately, computers can make these decisions blazing fast.

In computer programming we call this computer decision making process control flow. But let’s write some pseudocode for a little better understanding:

Pretty simple, right? The only thing a computer can’t do is decide if it likes Toothpaste 1 on its own. We have to program it to do that.

Well, this sentence makes sense if a computer is trying to decide to buy toothpaste or no toothpaste, but what if there are more than two toothpaste options? We just create another if-then statement.

Because the computer makes decisions in order it read them, then if it buys Toothpaste 1 then it will not buy Toothpaste 2. However, if he doesn’t like Toothpaste 1 the best, then if he thinks Toothpaste 2 is the best he’ll buy it. Otherwise, he will not buy any toothpaste–which makes sense, computers don’t have teeth.

This is almost everything we need to know about `if-then`, two more little catches.

First, what do we do if the computer doesn’t like any of the Toothpaste and don’t want him to just give up? We need a way to say, “Look computer, if you don’t like any toothpaste the best then go ask for help.”

In programming this is known as `if-then-else` statements. They are similar to `if-then` but with a contingency clause if something goes wrong.

Let’s take a look:

Ok, that’s it. Now let’s apply it to SQL.

SQL CASE WHEN

SQL applies if-then logic in several ways. We’ve already looked at the `WHERE` statement, which basicaly works like an `if-then`.

See how this could be written as

But the most likely SQL statement used for `if-then-else` logic is the `CASE WHEN` statement.

Here’s an example to be run in R.

Here is the output:

PersonalID FirstName LastName DOB Gender Millennial
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 Timmy Tesa 2010-01-01 Male Yes
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV Fela Falla 1999-1-1 Female No
LASDU89NRABVJWW779W4JGGAN90IQ5B2 Sarah Kerrigan 1992-04-01 Female No

The SQL query, specifically the `CASE WHEN` statement created a column called `Millennial`, it then went through every person’s date of birth, comparing it. When the query found a person who was born after 2000-01-01 it inserted a ‘Yes’ in the Millennial column. If they were not born after 2000-01-01 then it set the `Millennial` column to ‘No.’ Nifty, right?

Notice, the `ELSE` is required to get the ‘No’. Otherwise, the query would leave everyone else blank.

Here’s a few more examples of using CASE WHEN for powerful results.

Using OR with CASE WHEN

PersonalID FirstName LastName DOB Gender Cool
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 Timmy Tesa 2010-01-01 Male PersonIsCool
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV Fela Falla 1999-1-1 Female NotHip
LASDU89NRABVJWW779W4JGGAN90IQ5B2 Sarah Kerrigan 1992-04-01 Female PersonIsCool

Using AND with CASE WHEN

PersonalID FirstName LastName DOB Gender Queen of Blades
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 Timmy Tesa 2010-01-01 Male
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV Fela Falla 1999-1-1 Female
LASDU89NRABVJWW779W4JGGAN90IQ5B2 Sarah Kerrigan 1992-04-01 Female Yes

Using SUM with CASE WHEN

Using `CASE WHEN` in combination with `SUM` is a great way to get counts of different discrete data. Below is an example of getting total counts of males and females within the peopleDf

NumberOfFemales NumberOfMales
2 1

Using Multiple CASES

So far, we’ve only covered one `if-then` statement, but in our example with the toothpaste we could string them together. The same can be done with `CASE WHEN`.

PersonalID FirstName LastName DOB Gender Generation
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 Timmy Tesa 2010-01-01 Male NotYetDefined
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV Fela Falla 1999-1-1 Female Y
LASDU89NRABVJWW779W4JGGAN90IQ5B2 Sarah Kerrigan 1992-04-01 Female Y

Paste

The `paste()` in R is meant for manipulating strings of text. You pass it strings as parameters and it returns one string containing all the strings passed into it. Let’s take a look.

After running this line the `greeting` variable contains the following string `Hello how are you, Bob?`. This can be used by printing the contents of the variable using the `print()`

Side note, `print()` will actually print out anything you pass it to the console. This can be useful when trying to debug code.

Back to our combined strings, notice whenever the greeting prints out there is a space inserted between ‘you,’ and ‘Bob?’, this is done automatically by paste. It will insert a space between every string you pass it, unless you pass the additional parameter `sep`. This parameter will take whatever you set it as and insert it between the two strings.

This time `print()` will display “Hello how are you,!!Bob?” in the console. But, inserting exclamation marks is probably not what we want. Most of the time we will not want paste to insert anything and we can tell it to insert nothing.

Print will spit out “Hello how are you,Bob?”. Notice, there is no longer any character between “you,” and “Bob?”.

Paste is a pretty straightforward function, the one last trick is knowing you can pass in multiple strings.

This will produce the string “Hello how are you, Bob?”. Notice the spaces were inserted manually so the end string is readable to humans.

Dynamic SQL with Paste()

Prepare to have your mind blown. One of the powers of the `paste()` is building a `sqldf` string. Remember using SQLdf like this?

This creates the table

PersonalID FirstName LastName DOB
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 Timmy Tesa 2010-01-01

This is a dataframe of everyone who was born after January 1st, 2001. This method of filtering data works for a static date. But let’s say you wanted to easily change out the `2001-01-01` with other dates. You could replace the date with a different date, but when that date is in multiple SQL calls it can be easy to miss one. A better way to do it is using the `paste()`. And remember, everything inside the `sqldf()` parentheses is a string.

Ok, let’s take this slow, there’s a lot going on. First, we create a variable called `targetDate` and assign it the string `2001-01-01`. Next, we create a complex string using the `paste()` which looks a lot like a SQLdf string, but instead of hardcoding the date, we insert the `targetDate` variable. This creates the following string:

Which is then inserted into the variable `sqlString`, which is a string.

Lastly, we pass the `sqlString` variable into the `sqldf()` which executes the fancy SQL query. Awesome, right?

Now, if we want to look at those born after a different date, we simply change the `targetDate` variable and re-run the script.

What is a Data Warehouse

## Insights over DataData. They are the plastic of the tech world. We're are making way too much of it, you can't seem to get rid of it, ...… Continue reading

Scraping Images from Google Search Results using Python

Published on December 23, 2020

Train a Generative Adversarial Network to Create Magic Symbols

Published on November 16, 2020