Chapter 3 File Management
Supplement to POL90
3.1 Overview
Typically, when you get an error loading data, the path to the file is incorrect. Below is some brief background and a handful of possible solutions.
3.2 Background: Working Directories
3.2.0.1 R Working Directory
When working interactively in R, the program has what is called a “working directory.” This is where R expects to start looking for files. When R sees a command like read.csv("my_data.csv")
, it looks in the working directory and, if it doesn’t find it, will return an error. To see your working directory, type getwd()
in your CONSOLE and hit return.
3.2.0.2 Rmd Working Directory
Separate from the R working directory, every R Markdown file assumes that the working directory is the folder in which that Rmd file is stored. So, if "my_file.Rmd"
and "my_data.csv"
are in the same folder, then something like read.csv("my_data.csv")
should work when knitting but may not work interactively.
In lots of cases, though, our Rmd and data are not in the same folder. Or, we’re working interactively and then R may use the R working directory (which may be different from where the Rmd file is stored). Again, you can check the R working directory by typing getwd()
in your CONSOLE and hitting return.
3.3 Possible Solutions
To help R find the file, we can do one of several things:
3.3.0.1 1. Hard Coded File Path (easy to do, easy to break)
A simple solution is to provide R with a file path that shows exactly where the file is stored on your computer.
Within R, one way to find the file path is to go to the CONSOLE
area of RStudio and type file.choose()
and hit enter/return. A window will pop-up and, if you can find the file and select it, R will return the path to that file. You then need to copy and paste that file path into your read.csv()
or read.dta()
, etc. as in
anes <- read_dta("/Users/owasow/Research/anes/anes_timeseries_2020_gss_bridge_20220408.dta")
On a Mac, you can find the path to a file with the following simple steps:
Go to Finder and locate file on your computer. Click once on your file.
CLICK on the EDIT menu, hold down the OPTION key and select COPY “MY_FILE” AS PATHNAME
On Windows, try:
Go to Explorer and locate the file on your computer. Click once on your file.
SHIFT-CLICK on the file and select COPY PATH.
There are several downsides to this approach though.
First, if you move your file to a new folder, the path will break.
Second, if you are collaborating with others, each person will typically have a different hard coded file path that will need to be changed depending on who is working on the code.
3.3.0.2 2. Change the R Working Directory (okay intermediate solution)
If you create a new folder for each of your own projects (such as a problem set or a final), one approach is to manually change the R working directory to the relevant folder. Or, if you are working on a team and do not want to use hard coded paths, one solution is for each person to change their R working directory manually to point to the folder that contains the Rmd (and data).
An easy way to do this is to go to RStudio -> SESSION menu -> SET WORKING DIRECTORY
and then select one of the options.
If you have your relevant Rmd open, you can select SET WORKING DIRECTORY -> TO SOURCE FILE LOCATION
If you don’t have your relevant Rmd open, you can select SET WORKING DIRECTORY -> TO CHOOSE DIRECTORY
and then manually pick the working directory
The main downside of this approach is that it requires manually setting the working directory a lot rather than something that works automatically. In a class where you have lots of assignments or multiple team projects, this can be cumbersome and prone to error. For example, your working directory will likely point to an old assignment folder every week.
3.3.0.3 3. Create an RStudio Project (slightly more advanced but recommended)
RStudio has an option to create what it calls “R Projects” that automatically set the working directory in whatever folder the R Project resides. Creating an R Project is simple. Go to RStudio -> FILE menu -> NEW PROJECT
If you need to create a NEW DIRECTORY where you want to do your work (such as where your Rmd, data, etc. will go), choose NEW DIRECTORY
If you already have a folder or directory where you want your work to go, choose EXISTING DIRECTORY
Once you have an RStudio, project file created, there are two more simple steps:
First, when you want to open RStudio, DON’T directly open the application RStudio but, rather, go to the folder that has the R Project and open that (this will open RStudio with the relevant folder as the working directory). The icon for R Projects looks like a cube and, if you can see file extensions, will have .Rproj
Second, for any collaborative projects, I recommend you use an R package called here() that helps create file paths across different computers
- To use here, be sure to load
library(here)
at the top of your document - When opening files, use
here()
around the file name as in: - When your data file is in the main working directory:
anes <- read_dta(here("anes_timeseries_2020_gss_bridge_20220408.dta"))
- OR if your data file is nested in another folder called something like “anes_data”:
anes <- read_dta(here("anes_data/anes_timeseries_2020_gss_bridge_20220408.dta"))
- To use here, be sure to load
3.3.0.4 Additional Resources
For more on here and using a “Project Oriented Workflow” see Jenny Bryan’s post: https://www.tidyverse.org/blog/2017/12/workflow-vs-script/
More on “Project Oriented Workflow” from Jenny Bryan and Jim Hester: https://rstats.wtf/project-oriented-workflow.html
Also see Martin Chan’s “RStudio Projects and Working Directories: A Beginner’s Guide,” https://martinctc.github.io/blog/rstudio-projects-and-working-directories-a-beginner%27s-guide/