SQL Case

The SQL CASE function is one of my favorite. The command basically works like if-then command. If you are familiar with if-then commands, then feel free to skip this next bit.

If-Then

One of the reasons we have the amazing devices we do today is because a computer is capable of reasoning. A computer can compare two things and decide which one it likes.

Now, this may sound simple, but it’s actually a subtle miracle. Anyone who has been stuck on the toothpaste isle trying to decide between the 45 kinds of toothpaste probably understands making decisions is difficult. Of course, human decision making and computer decision making are not even on the same level. Humans can make comparisons of all 45 products at once(sort of). Computers, they have to make a decision between two objects, then, two objects, then two objects, so forth, until it has made it through all 45. Fortunately, computers can make these decisions blazing fast.

In computer programming we call this computer decision making process control flow. But let’s write some pseudocode for a little better understanding:

    If (Computer Likes Toothpaste 1) then buy Toothpaste 1

Pretty simple, right? The only thing a computer can’t do is decide if it likes Toothpaste 1 on its own. We have to program it to do that.

Well, this sentence makes sense if a computer is trying to decide to buy toothpaste or no toothpaste, but what if there are more than two toothpaste options? We just create another if-then statement.

    If (Computer Likes Toothpaste 1 Best) then buy Toothpaste 1
    If (Computer Likes Toothpaste 2 Best) then buy Toothpaste 2

Because the computer makes decisions in order it read them, then if it buys Toothpaste 1 then it will not buy Toothpaste 2. However, if he doesn’t like Toothpaste 1 the best, then if he thinks Toothpaste 2 is the best he’ll buy it. Otherwise, he will not buy any toothpaste–which makes sense, computers don’t have teeth.

This is almost everything we need to know about if-then, two more little catches.

First, what do we do if the computer doesn’t like any of the Toothpaste and don’t want him to just give up? We need a way to say, “Look computer, if you don’t like any toothpaste the best then go ask for help.”

In programming this is known as if-then-else statements. They are similar to if-then but with a contingency clause if something goes wrong.

Let’s take a look:

    if (Computer Likes Toothpaste 1 Best) then buy Toothpaste 1
    if (Computer Likes Toothpaste 2 Best) then buy Toothpaste 2
    else Go Ask a Computer Dentist what to buy

Ok, that’s it. Now let’s apply it to SQL.

SQL CASE WHEN

SQL applies if-then logic in several ways. We’ve already looked at the WHERE statement, which basicaly works like an if-then.

    SELECT * FROM data WHERE Name = 'Bob'

See how this could be written as

    SELECT * FROM data IF Name = 'Bob'

But the most likely SQL statement used for if-then-else logic is the CASE WHEN statement.

Here’s an example to be run in R.

library(sqldf)
################### Data DO NOT CHANGE ###########################
peopleDf <- data.frame(PersonalID=c("ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7", "IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV", "LASDU89NRABVJWW779W4JGGAN90IQ5B2"), 
                       FirstName=c("Timmy", "Fela", "Sarah"),
                       LastName=c("Tesa", "Falla", "Kerrigan"),
                       DOB=c("2010-01-01", "1999-1-1", "1992-04-01"))
##################################################################

peopleDf1 <- sqldf("SELECT *, 
                  CASE WHEN DOB > '2000-1-1' THEN 'Yes' ELSE 'No' END As 'Millennial' 
                  FROM peopleDf")

Here is the output:

PersonalID FirstName LastName DOB Gender Millennial
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 Timmy Tesa 2010-01-01 Male Yes
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV Fela Falla 1999-1-1 Female No
LASDU89NRABVJWW779W4JGGAN90IQ5B2 Sarah Kerrigan 1992-04-01 Female No

The SQL query, specifically the CASE WHEN statement created a column called Millennial, it then went through every person’s date of birth, comparing it. When the query found a person who was born after 2000-01-01 it inserted a ‘Yes’ in the Millennial column. If they were not born after 2000-01-01 then it set the Millennial column to ‘No.’ Nifty, right?

Notice, the ELSE is required to get the ‘No’. Otherwise, the query would leave everyone else blank.

Here’s a few more examples of using CASE WHEN for powerful results.

Using OR with CASE WHEN

peopleDf2 <- sqldf("SELECT *, 
                  CASE WHEN DOB > '2000-1-1' OR FirstName = 'Sarah' THEN 'PersonIsCool' ELSE 'NotHip' END As 'Cool?' 
                  FROM peopleDf")
PersonalID FirstName LastName DOB Gender Cool
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 Timmy Tesa 2010-01-01 Male PersonIsCool
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV Fela Falla 1999-1-1 Female NotHip
LASDU89NRABVJWW779W4JGGAN90IQ5B2 Sarah Kerrigan 1992-04-01 Female PersonIsCool

Using AND with CASE WHEN

peopleDf3 <- sqldf("SELECT *, 
                  CASE WHEN FirstName = 'Sarah' AND LastName = 'Kerrigan' THEN 'Yes' ELSE '' 
                  END As 'Queen of Blades' 
                  FROM peopleDf")
PersonalID FirstName LastName DOB Gender Queen of Blades
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 Timmy Tesa 2010-01-01 Male  
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV Fela Falla 1999-1-1 Female  
LASDU89NRABVJWW779W4JGGAN90IQ5B2 Sarah Kerrigan 1992-04-01 Female Yes

Using SUM with CASE WHEN

Using CASE WHEN in combination with SUM is a great way to get counts of different discrete data. Below is an example of getting total counts of males and females within the peopleDf

count1 <- sqldf("SELECT 
                  SUM(
                      CASE WHEN Gender = 'Female' THEN 1 ELSE 0 END
                    ) As 'NumberOfFemales',
                  SUM(
                      CASE WHEN Gender = 'Male' THEN 1 ELSE 0 END
                    ) As 'NumberOfMales'
                   FROM peopleDf")
NumberOfFemales NumberOfMales
2 1

Using Multiple CASES

So far, we’ve only covered one if-then statement, but in our example with the toothpaste we could string them together. The same can be done with CASE WHEN.

peopleDf4 <- sqldf("SELECT *, CASE WHEN DOB >= '1980-01-01' AND DOB < '1990-01-01' THEN 'X'
                           WHEN DOB >= '1990-01-01' AND DOB < '2000-01-01' THEN 'Y'
                           WHEN DOB >= '2000-01-01' AND DOB < '2010-01-01' THEN 'Millennial'
                           WHEN DOB >= '2010-01-01' AND DOB < '2020-01-01' THEN 'NotYetDefined'
                           END As 'Generation'
                   FROM peopleDf")
PersonalID FirstName LastName DOB Gender Generation
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 Timmy Tesa 2010-01-01 Male NotYetDefined
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV Fela Falla 1999-1-1 Female Y
LASDU89NRABVJWW779W4JGGAN90IQ5B2 Sarah Kerrigan 1992-04-01 Female Y

Paste

The paste() in R is meant for manipulating strings of text. You pass it strings as parameters and it returns one string containing all the strings passed into it. Let’s take a look.

greeting <- paste("Hello how are you,", "Bob?")

After running this line the greeting variable contains the following string Hello how are you, Bob?. This can be used by printing the contents of the variable using the print()

print(greeting)

Side note, print() will actually print out anything you pass it to the console. This can be useful when trying to debug code.

Back to our combined strings, notice whenever the greeting prints out there is a space inserted between ‘you,’ and ‘Bob?’, this is done automatically by paste. It will insert a space between every string you pass it, unless you pass the additional parameter sep. This parameter will take whatever you set it as and insert it between the two strings.

greeting <- paste("Hello how are you,", "Bob?", sep = "!!")
print(greeting)

This time print() will display “Hello how are you,!!Bob?” in the console. But, inserting exclamation marks is probably not what we want. Most of the time we will not want paste to insert anything and we can tell it to insert nothing.

greeting <- paste("Hello how are you,", "Bob?", sep = "")
print(greeting)

Print will spit out “Hello how are you,Bob?”. Notice, there is no longer any character between “you,” and “Bob?”.

Paste is a pretty straightforward function, the one last trick is knowing you can pass in multiple strings.

greeting <- paste("Hello", " how are you,", " Bob?", sep = "")
print(greeting)

This will produce the string “Hello how are you, Bob?”. Notice the spaces were inserted manually so the end string is readable to humans.

Dynamic SQL with Paste()

Prepare to have your mind blown. One of the powers of the paste() is building a sqldf string. Remember using SQLdf like this?

library(sqldf)
################### Data DO NOT CHANGE ###########################
peopleDf <- data.frame(PersonalID=c("ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7", "IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV", "LASDU89NRABVJWW779W4JGGAN90IQ5B2"), 
                       FirstName=c("Timmy", "Fela", "Sarah"),
                       LastName=c("Tesa", "Falla", "Kerrigan"),
                       DOB=c("2010-01-01", "1999-1-1", "1992-04-01"))
##################################################################

peopleDf1 <- sqldf("SELECT * FROM peopleDf WHERE DOB > '2001-01-01'")

This creates the table

PersonalID FirstName LastName DOB
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 Timmy Tesa 2010-01-01

This is a dataframe of everyone who was born after January 1st, 2001. This method of filtering data works for a static date. But let’s say you wanted to easily change out the 2001-01-01 with other dates. You could replace the date with a different date, but when that date is in multiple SQL calls it can be easy to miss one. A better way to do it is using the paste(). And remember, everything inside the sqldf() parentheses is a string.

targetDate <- "2001-01-01"
sqlString <- paste("SELECT * FROM peopleDf WHERE DOB > '", targetDate, "'", sep = "")
peopleDf5 <- sqldf(sqlString)

Ok, let’s take this slow, there’s a lot going on. First, we create a variable called targetDate and assign it the string 2001-01-01. Next, we create a complex string using the paste() which looks a lot like a SQLdf string, but instead of hardcoding the date, we insert the targetDate variable. This creates the following string:

"SELECT * FROM peopleDf WHERE DOB > '2001-01-01'"

Which is then inserted into the variable sqlString, which is a string.

Lastly, we pass the sqlString variable into the sqldf() which executes the fancy SQL query. Awesome, right?

Now, if we want to look at those born after a different date, we simply change the targetDate variable and re-run the script.

targetDate <- "1980-01-01"
sqlString <- paste("SELECT * FROM peopleDf WHERE DOB > '", targetDate, "'", sep = "")
peopleDf5 <- sqldf(sqlString)

Sys.Date()

GSUB

Creating Reusable Code

Writing report code which can be reused is critical to being an effective report specialist. By now, hopefully, you see the power of SQL-R, especially around HMIS data. But you may still feel slow. Or have thoughts like, “If I pulled these data into Excel I could manually filter them in 1/10th the time.” That’s probably true. But, after manually filtering dataset after dataset it becomes apparent finding a way to automate some tasks would save many hours in the long-run. Thus, writing an R scripts for routine work would save countless hours of monotony.

However, one problem remains, each task will usually have a slight variation from the one before it. This causes you to write 95% of the same code with a slight tweak for the current project. And that doesn’t save time at all. In the programming world, the 95% code which is the same is known as bolierplate code.

Ok, that’s the problem. The solution? Functions.

A function is nothing more than a section of code you save into a variable for easy reuse.

Defining a function looks like this:

myNewFunction <- function(){
  # Code you want to run goes here.
}

Then, whenever you want to use this code it can be called like this:

myNewFunction()

If you want to pass the function something to use:

myNewFunction <- function(clientDf){
  clientDf$VeteranStatus
}
clientDf <- read.csv(clientCsvPath)
myNewFunction(clientDf)

And the coolest thing about functions is being able to return data. Functions return whatever data is on the last line of the function. This can be a tricky concept, but at its root it is simple.

Here, the clientDf will be returned.

myNewFunction <- function(clientDf){
  clientDf$VeteranStatus[clientDf$VeteranStatus == "1"]
  clientDf
}
clientDf <- read.csv(clientCsvPath)
veteranList <- myNewFunction(clientDf)

The result is then passed back out of the function, where it can be assigned to a new variable.

You may notice, this is similar to a lot of code we have been using. Like read.csv. That’s because read.csv is a function written by the makers of R and included for our use.

clientDf <- read.csv(clientCsvPath)

This is how R has become powerful tool. Many smart people have written sets of functions, which are called libraries. Feel the power of open-source.

Time to give back to community and write some of our own functions

Data Needed

For this work challenge you will need:

  1. Client.csv
  2. Enrollment.csv
  3. Project.csv
  4. Exit.csv

The Goal

Write functions which will do the following:

  • Join clientDf, enrollmentDf, projectDf, exitDf and return the combined dataframe.
  • Make the following columns readable:
    • Gender
    • VeteranStatus
    • DisablingCondition
    • RelationshipToHoH
    • ResidencePriorLengthOfStay
    • LOSUnderThreshold
    • PreviousStreetESSH
    • TimesHomelessPastThreeYears
    • MonthsHomelessPastThreeYears
    • Destination
  • Get most recent HUD Assessment per PersonalID
  • Filter to clients who are active in programs (except Night-by-Night and Street Outreach projects)
  • Write a function to filter enrollmentDf based upon a user defined parameter.

BONUS

  • Write a function which returns a list of Chronically Homeless individuals.

For the last function, here’s an example,

clientsWithDisablingCondition <- getSubpopulation(df, "DisablingCondition", "Yes")

The function you’d write would be getSubpopulation(). The first parameter would be the dataframe the user is passing into your function. Second parameter is the column to look at. The last is which response the user wants in the column to look in.

The Resources

Below are the resources which should help for each step:

  • R Programming A-Z – Video 21 – Functions in R
  • paste()

Individuals Experiencing Homelessness

This graph shows the trend of those homeless in Tarrant County, week-to-week who meet the following conditions:

  1. The person counted has stayed at least one night in a Night-by-nNight shelter within 90-days of the week counted.
  2. Or the person counted has been contacted by Street Outreach within 90-days of the week counted.
  3. Or the person was active in an Entry / Exit shelter program within the week of the count.

Most likely the count is inflated approximately 33%, given there is a large known number of duplicates in the count. The software used to generate the data has no administrator option to merge duplicates. A request has been made for mass merger.

Active in Rapid Rehousing

Another trend found in the graph is a week-to-week count of those homeless who are active in a Rapid Rehousing (RRH) project.

The duplicate issue should not be as pronounced here, as even if a duplicate where created during the sheltered phase of a participant’s stay in homelessness, then only one of the pair would be enrolled into the housing project. Therefore, enrollment into housing is a natural filter.

Active in Permanent Supportive Housing

This trend is similar to the RRH trend.

Notice the line is flat. This is to be expected, as entry and exits are rare in Permanent Supportive Housing projects.

Subpopulations

This graph relates to the Trends of Homelessness, Rapid Rehousing, and Permanent Supportive Housing graph. It looks at the last week of the same data. Of those participants who are still actively homeless (and therefore eligible for housing), what sorts of barriers do these individuals face. HUD refers to these groups of individuals with particular difficulties as “subpopulations.”

It is important to understand these barriers are not mutually exclusive. For example, Jane could report both a Mental Health Problem and Substance Abuse Disorder and she would therefore be counted in both sub-populations.

The three are categories defined as follows:

  • Eligible for Rapid Rehousing are individuals who are actively in a homeless situation and are not met the chronically homeless threshold.
  • Eligible for Permanent Supportive Housing are individuals who are actively in a homeless situation are have met the threshold of chronically homeless
  • All Eligible for Housing is the sum of both Eligible for Rapid Rehousing and Eligible for Permanent Supportive Housing
  • It should be noted, Eligible for Rapid Rehousing and Eligible for Permanent Supportive Housing are mutually exclusive. Therefore, the All Eligible for Housing is an accurate count save the duplicates described above.

Trend of Subpopulations

Churning Data into Information

I work with a lot of data on the behalf of an agency without a lot of money. Exploring free-to-use and open-source tools is key to being effective in my job.

Recently, I’ve written a a couple of series on how to use R and SQL to sort through Homeless Management Information System data.

These data are essential to local governments helping individuals experiencing homelessness to be housed quickly and appropriately.

But one area R and SQL have not delivered is on-line interactive dashboards. Data is one thing, but easy to digest information is really key to informing stakeholders how the system is working to end homelessness.

In other projects I’ve attempted to generate graphs as images and upload to a static link. Then, each time the data change re-generate replace the image. But, most website servers cache the images so it is not ideal.

This has pushed me to try to learn D3.

I’m not going to lie, I’ve felt confused by languages, IDEs, and libraries. And I’ve overcome most of the these challenges. But I’ve never been so confused as by the layout and syntax of D3. The dyslexic feeling I get trying to work in D3 has discouraged me from spending too much time on it.

But recently I decided to take another stab at it– this time I lucked out and found the C3.js.

Essentially, C3 is a library which greatly simplifies D3. It boils down building a graph into a set of options passed to the C3 graph builder as a JSON object.

This code:

var chart = c3.generate({
    data: {
        x: 'Date',
        y: '# Individuals',
        xFormat: '%Y-%m-%d',
        url: 'https://ladvien.com/projects/d3/data/trendsInTX601.csv',
        type: 'line',
        // colors: {
        //     Count: '#990000'
        // }
        names: {
            NumberHomeless: "Homeless",
            NumberInRRH: "Rapid Rehousing",
            NumberInPSH: "Permanent Supportive Housing"
        }
    },
    
    title: {
        text: "Homeless or Formerly Homeless in TX-601"
    },

    legend: {
        show: true
    },

    axis: {
        x: {
            type: 'timeseries',
            tick: {
                count: 4,
                format: '%Y-%m-%d',
                // rotate: 90,
                multiline: false,
                
                culling: {
                    max:5 
                }
            }
        },
        y: {
            max: 3000,
            min: 0,
            label: "# Individuals"
            // Range includes padding, set 0 if no padding needed
            // padding: {top:0, bottom:0}
        },
    },
    
    point: {
        r: 0
    }
});

Using this CSV:

Produces the following graph:

One Hiccup

I did run into a one hiccup in setup. It seems the most recent version of d3 (version 4.0) has had much of its API overhauled. In such, it will not work with C3. But D3 v3 is still available from the D3 CDN:

<script src="https://d3js.org/d3.v3.min.js"></script>

Calling this library and following the instructions outlined by the C3 site, you can be generating graphs in little time.

Updating Data Securely and On Schedule

Now that I’ve the ability to use R and SQL to sort through my data, and I could quickly generate graphs using D3 and C3, it’d be really nice if a lot of this could be automated. And luckily, I’d run into a few other tools which made it pretty easy to replace the data on my C3 graphs.

Rsync

Rsync is primarily a Linux tool, but it is available on Windows as well. It is nice since it will allow you to quickly reconcile two file-trees (think of a manual Dropbox).

It will also allow you to sync a local file tree with a server file tree across an SSH connection. For example, I use the following command to sync the data mentioned above to the server

rsync -avz /Users/user/data/js-practice/d3/* ladvien@ladvien.com:/usr/share/nginx/html/projects/d3/

After running this command it will prompt for a password to access the server. Then, it will proceed to sync the two file-trees. Nifty!

This allows me to quickly update the data on the graph. Now, if only there were a way to automatically insert my password, then I could write a script to automate the whole process.

Python Keyring

Python Keyring is a tool which allows you to save and retrieve passwords from your PC’s keyring.

It is compatible with:

  • Mac OS X Keychain
  • Freedesktop Secret Service (requires secretstorage)
  • KWallet (requires dbus)
  • Windows Credential Vault

If you have Python installed you can install the Keyring tool with Pip:

$pip install keyring

After, you can store a password in the keyring by using the command-line tool. You will need to replace username with the name of your server login.

$keyring set system username

And retrieve it with:

$keyring get system username

This is great. It means we can store our password in the keyring and retrieve it securely from a script.

Great! Now we could write a script to have Rsync sync the any data changes locally with the server. Right? Well, almost. We needed one more tool.

SSHPass

There is a problem with using Rsync to sync files remotely from a script. When Rsync is called from a script it will not wait for parameters to be passed to the tool. Sigh.

Luckily, I’m not the only with this problem and a tool was created to solve this problem.

If you are on a Mac you’ll need to use Brew to install SSHPass.

brew install https://raw.githubusercontent.com/kadwanev/bigboybrew/master/Library/Formula/sshpass.rb 

There we go! Now we can automate the whole process.

I wrote this script to do the dirty work:

#!/bin/sh
PASSWORD=("$(keyring get system ladvien.com)")
ECHO ""
ECHO "****************************"
ECHO "* Updating D3 Projects     *"
ECHO "****************************"
ECHO ""
sshpass -p "$PASSWORD" rsync -avz /Users/user/data/js-practice/d3/* root@ladvien.com:/usr/share/nginx/html/projects/d3/

Cron

Ok! One last bit of sugar on this whole process. Let’s create a Cron job. This will run the script in the background at an interval of our choosing.

For me, I’ve a staff who pulls data and runs a master script every Monday. So, I’ll set my automated script to update my C3 graph data on Tuesday, when I know new data is available.

You can use Nano to edit your Cron job list.

env EDITOR=nano crontab -e

To run a Cron job on Tuesday we would set the fifth asterisk to 2.

* * * * 2 /the/path/to/our/update_script.sh

And don’t forget to make the update_script.sh executable.

chmod +x update_script.sh

I’m a hacker hacking with a hacksaw!

Setup Headless WiFi on Re4son's Kali Pi

I bought a few Raspberry Pi Zero W’s for $10. It was happenstance I also purchased the Udemy course Learn Ethical Hacking from Scratch. I figure, I might as well put these things together.

I also discovered the Sticky Fingers Kali Pi kernel and distros put together by Re4son.

It has worked well so far. However, I’ve not fully tested the Bluetooth LE hardware on the custom kernel.

One of the issues I’ve had is not being able to connect to new hotspots headlessly. Usually, you’d boot the rp0w connected to a monitor, keyboard, mouse, and edit wpa_supplicant.conf directly. But what if you want to go into a new location with only your laptop and the rp0w. How would you add the wifi credentials to the rp0w without a monitor, etc.

For awhile, I tried to get the ethernet gadget setup to work on the rp0w without any luck. I think the problems relates to trying to use the gadget hardware on a Mac rather than a Windows machine.

In the end, I decided I would add a script which would do the following:

  1. Mount the /boot partition (which is editable through PC’s SD card reader).
  2. Look for a file on the /boot called “wpa_supplicant.txt” and copy it to the /etc/wpa_supplicant.conf
  3. Look for a file on the /boot called “interfaces.txt” and copy it to the /etc/networks/interfaces
  4. Unmount /boot
  5. Remove the /boot directory

I saved this script in /root as wifi_setup.sh. I then added a call to it in /etc/rc.local

#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.
/root/wifi_setup.sh || exit 1
exit 0

Here’s the wifi_setup.sh

#!/bin/bash

if [ ! -d "/boot" ]; then
        echo 'Mounting /boot'
        cd ..
        mkdir /boot
        mount /dev/mmcblk0p1 /boot
fi

if [ -f "/boot/wpa_supplicant.txt" ]; then
        echo 'Applying wpa_supplicant'
        cp /boot/wpa_supplicant.txt /etc/wpa_supplicant.conf
        mv /boot/wpa_supplicant.txt /boot/wpa_supplicant.applied.txt
fi

if [ -f "/boot/interfaces.txt" ]; then
        echo 'Applying intefaces'
        cp /boot/interfaces.txt /etc/network/interfaces
        mv /boot/interfaces.txt /boot/interfaces.applied
fi

umount /boot
rm -r /boot

This has let me add a new network from my laptop with merely an SD card reader.