This article is part of a series.
- Part 1 - HMIS, R, and SQL -- Introduction
- Part 2 - HMIS, R, and SQL -- Basics
- Part 3 - Preamble to Mixing R and SQL
- Part 4 - HMIS, R, SQL -- Work Challenge One
- Part 5 - This Article
- Part 6 - Working with Dates in SQLdf
- Part 7 - Filter to Most Recent HUD Assessment
- Part 8 - HMIS, R, SQL -- Work Challenge Two
- Part 9 - Comparing Values in R and SQL
- Part 10 - Working with R Excel Libraries
- Part 11 - HMIS, R, SQL -- Work Challenge Three
- Part 12 - HMIS, R, SQL -- Work Challenge Four
- Part 13 - SQL CASE and R Paste
R let’s us work with HMIS data, but there is still the problem of how to load the data into R from a source. R is actually pretty neat regarding data importing. We can load data from a website, SQL database, text file, Excel file, or CSV.
When working with HMIS data the two most common types of data source are Excel and CSV. Therefore, it’s going to pay to spend a little time on how to bring those files into R.
Importing CSVs into R
CSV stand for Comma Separated Value format. It is a near ancient file format which is used to store data in rows and columns. If you have a CSV file on the computer (in Windows) right-click on it and open it in Notepad. The contents should look something like this:
However, if you open the same CSV in Excel it will look something like this:
Let’s be honest, the second example is easier for humans to read (well, unless you’re an odd human). And for the most part, we will be looking at CSVs in Excel or in RStudio’s dataview which looks similar to Excel. However, it is important to note the easier to read version can hide data errors that may only be visible by viewing the raw CSV. Nothing to worry about now, but keep it in mind.
Alright, let’s explore how to load a CSV file into R. There is many way’s to do this, but let’s start with a super-easy way:
These two commands, when executed, will force R to create a file selection box. This will allow us to easily select the CSV to load into R. Once selected and we press
Ok then R will load the selected file’s path into the variable
The next command
read.csv() takes the path provided and attempts to load the file it points to and converts it into a dataframe. Once R creates a dataframe from the file selected it saves it in the variable
And that’s it. The data is now loaded into R and it can be manipulated with other commands.
Saving data from R is the other end of importing data. This process is often referred to as exporting data. Really, it’s simply taking a R dataframe and converting it into a file–once in a file form it can be loaded into Excel or emailed to a peer (but, please don’t email personal information–remember, CSVs are simply text).
To write data a CSV use
That’s it, pretty simple, eh? Well, there are a couple of catches. When R saves a CSV it does a few things which are annoying for using the data in other way. For example, let’s say we have data in R that looks like this:
However, after it is written to a file it will look like this:
Note the added
NA. It is a good practice to put an NA in places where there are no data. Unfortunately, when dealing with HMIS data sets the standard is to leave a blank instead. To get R to conform to this standard we use an option in the
na="" tells R to write the document without changing blanks into
NA. The result of the code above should look like: