Read and Write CSVs in R

Reading time ~3 minutes

R let’s us work with HMIS data, but there is still the problem of how to load the data into R from a source. R is actually pretty neat regarding data importing. We can load data from a website, SQL database, text file, Excel file, or CSV.

When working with HMIS data the two most common types of data source are Excel and CSV. Therefore, it’s going to pay to spend a little time on how to bring those files into R.

Importing CSVs into R

CSV stand for Comma Separated Value format. It is a near ancient file format which is used to store data in rows and columns. If you have a CSV file on the computer (in Windows) right-click on it and open it in Notepad. The contents should look something like this:

PersonalID,FirstName,LastName
B7YIOJIGF9CDP6FV7TANQXLMQRMBTVTB,Bob,Person
ASGJ4F95HS85N39DJ12AJB094M59DJ45,Jane,People

However, if you open the same CSV in Excel it will look something like this:

PersonalID FirstName LastName
B7YIOJIGF9CDP6FV7TANQXLMQRMBTVTB Bob Person
ASGJ4F95HS85N39DJ12AJB094M59DJ45 Jane People

Let’s be honest, the second example is easier for humans to read (well, unless you’re an odd human). And for the most part, we will be looking at CSVs in Excel or in RStudio’s dataview which looks similar to Excel. However, it is important to note the easier to read version can hide data errors that may only be visible by viewing the raw CSV. Nothing to worry about now, but keep it in mind.

Alright, let’s explore how to load a CSV file into R. There is many way’s to do this, but let’s start with a super-easy way:

pathToCsv <- file.choose()
myCsvAsADataFrame <- read.csv(pathToCsv)
 

These two commands, when executed, will force R to create a file selection box. This will allow us to easily select the CSV to load into R. Once selected and we pressOk then R will load the selected file’s path into the variable pathToCsv.

The next command read.csv() takes the path provided and attempts to load the file it points to and converts it into a dataframe. Once R creates a dataframe from the file selected it saves it in the variable myCsvAsADataFrame

And that’s it. The data is now loaded into R and it can be manipulated with other commands.

Writing CSVs

Saving data from R is the other end of importing data. This process is often referred to as exporting data. Really, it’s simply taking a R dataframe and converting it into a file–once in a file form it can be loaded into Excel or emailed to a peer (but, please don’t email personal information–remember, CSVs are simply text).

To write data a CSV use write.csv().

For example:

write.csv(theDataFrameToWrite, "NameOfFile.csv")

That’s it, pretty simple, eh? Well, there are a couple of catches. When R saves a CSV it does a few things which are annoying for using the data in other way. For example, let’s say we have data in R that looks like this:

PersonalID Name SSN
123JJKLDFWE234234JGD0238D2342346   123-45-6589
B7YIOJIGF9CDP6FV7TANQXLMQRMBTVTB Bob Purdy  
DSK329GJB9234J5JFSDF94056NDUNVDF Frank 123-99-9999

However, after it is written to a file it will look like this:

PersonalID Name SSN
123JJKLDFWE234234JGD0238D2342346 NA 123-45-6589
B7YIOJIGF9CDP6FV7TANQXLMQRMBTVTB Bob Purdy NA
DSK329GJB9234J5JFSDF94056NDUNVDF Frank 123-99-9999

Note the added NA. It is a good practice to put an NA in places where there are no data. Unfortunately, when dealing with HMIS data sets the standard is to leave a blank instead. To get R to conform to this standard we use an option in the write.csv() function.

For example:

write.csv(theDataFrameToWrite, "NameOfFile.csv", na="")

The na="" tells R to write the document without changing blanks into NA. The result of the code above should look like:

PersonalID Name SSN
123JJKLDFWE234234JGD0238D2342346   123-45-6589
B7YIOJIGF9CDP6FV7TANQXLMQRMBTVTB Bob Purdy  
DSK329GJB9234J5JFSDF94056NDUNVDF Frank 123-99-9999

What is a Data Warehouse

## Insights over DataData. They are the plastic of the tech world. We're are making way too much of it, you can't seem to get rid of it, ...… Continue reading