This article is part of a series.
- Part 1 - HMIS, R, and SQL -- Introduction
- Part 2 - HMIS, R, and SQL -- Basics
- Part 3 - Preamble to Mixing R and SQL
- Part 4 - HMIS, R, SQL -- Work Challenge One
- Part 5 - Read and Write CSVs in R
- Part 6 - Working with Dates in SQLdf
- Part 7 - Filter to Most Recent HUD Assessment
- Part 8 - HMIS, R, SQL -- Work Challenge Two
- Part 9 - Comparing Values in R and SQL
- Part 10 - Working with R Excel Libraries
- Part 11 - HMIS, R, SQL -- Work Challenge Three
- Part 12 - HMIS, R, SQL -- Work Challenge Four
- Part 13 - This Article
CASE function is one of my favorite. The command basically works like if-then command. If you are familiar with if-then commands, then feel free to skip this next bit.
One of the reasons we have the amazing devices we do today is because a computer is capable of reasoning. A computer can compare two things and decide which one it likes.
Now, this may sound simple, but it’s actually a subtle miracle. Anyone who has been stuck on the toothpaste isle trying to decide between the 45 kinds of toothpaste probably understands making decisions is difficult. Of course, human decision making and computer decision making are not even on the same level. Humans can make comparisons of all 45 products at once(sort of). Computers, they have to make a decision between two objects, then, two objects, then two objects, so forth, until it has made it through all 45. Fortunately, computers can make these decisions blazing fast.
In computer programming we call this computer decision making process control flow. But let’s write some pseudocode for a little better understanding:
Pretty simple, right? The only thing a computer can’t do is decide if it likes Toothpaste 1 on its own. We have to program it to do that.
Well, this sentence makes sense if a computer is trying to decide to buy toothpaste or no toothpaste, but what if there are more than two toothpaste options? We just create another if-then statement.
Because the computer makes decisions in order it read them, then if it buys Toothpaste 1 then it will not buy Toothpaste 2. However, if he doesn’t like Toothpaste 1 the best, then if he thinks Toothpaste 2 is the best he’ll buy it. Otherwise, he will not buy any toothpaste–which makes sense, computers don’t have teeth.
This is almost everything we need to know about
if-then, two more little catches.
First, what do we do if the computer doesn’t like any of the Toothpaste and don’t want him to just give up? We need a way to say, “Look computer, if you don’t like any toothpaste the best then go ask for help.”
In programming this is known as
if-then-else statements. They are similar to
if-then but with a contingency clause if something goes wrong.
Let’s take a look:
Ok, that’s it. Now let’s apply it to SQL.
SQL CASE WHEN
SQL applies if-then logic in several ways. We’ve already looked at the
WHERE statement, which basicaly works like an
See how this could be written as
But the most likely SQL statement used for
if-then-else logic is the
CASE WHEN statement.
Here’s an example to be run in R.
Here is the output:
The SQL query, specifically the
CASE WHEN statement created a column called
Millennial, it then went through every person’s date of birth, comparing it. When the query found a person who was born after 2000-01-01 it inserted a ‘Yes’ in the Millennial column. If they were not born after 2000-01-01 then it set the
Millennial column to ‘No.’ Nifty, right?
ELSE is required to get the ‘No’. Otherwise, the query would leave everyone else blank.
Here’s a few more examples of using CASE WHEN for powerful results.
Using OR with CASE WHEN
Using AND with CASE WHEN
|PersonalID||FirstName||LastName||DOB||Gender||Queen of Blades|
Using SUM with CASE WHEN
CASE WHEN in combination with
SUM is a great way to get counts of different discrete data. Below is an example of getting total counts of males and females within the peopleDf
Using Multiple CASES
So far, we’ve only covered one
if-then statement, but in our example with the toothpaste we could string them together. The same can be done with
paste() in R is meant for manipulating strings of text. You pass it strings as parameters and it returns one string containing all the strings passed into it. Let’s take a look.
After running this line the
greeting variable contains the following string
Hello how are you, Bob?. This can be used by printing the contents of the variable using the
print() will actually print out anything you pass it to the console. This can be useful when trying to debug code.
Back to our combined strings, notice whenever the greeting prints out there is a space inserted between ‘you,’ and ‘Bob?’, this is done automatically by paste. It will insert a space between every string you pass it, unless you pass the additional parameter
sep. This parameter will take whatever you set it as and insert it between the two strings.
print() will display “Hello how are you,!!Bob?” in the console. But, inserting exclamation marks is probably not what we want. Most of the time we will not want paste to insert anything and we can tell it to insert nothing.
Print will spit out “Hello how are you,Bob?”. Notice, there is no longer any character between “you,” and “Bob?”.
Paste is a pretty straightforward function, the one last trick is knowing you can pass in multiple strings.
This will produce the string “Hello how are you, Bob?”. Notice the spaces were inserted manually so the end string is readable to humans.
Dynamic SQL with Paste()
Prepare to have your mind blown. One of the powers of the
paste() is building a
sqldf string. Remember using SQLdf like this?
This creates the table
This is a dataframe of everyone who was born after January 1st, 2001. This method of filtering data works for a static date. But let’s say you wanted to easily change out the
2001-01-01 with other dates. You could replace the date with a different date, but when that date is in multiple SQL calls it can be easy to miss one. A better way to do it is using the
paste(). And remember, everything inside the
sqldf() parentheses is a string.
Ok, let’s take this slow, there’s a lot going on. First, we create a variable called
targetDate and assign it the string
2001-01-01. Next, we create a complex string using the
paste() which looks a lot like a SQLdf string, but instead of hardcoding the date, we insert the
targetDate variable. This creates the following string:
Which is then inserted into the variable
sqlString, which is a string.
Lastly, we pass the
sqlString variable into the
sqldf() which executes the fancy SQL query. Awesome, right?
Now, if we want to look at those born after a different date, we simply change the
targetDate variable and re-run the script.