
Dr. Domenico Giusti
Paläoanthropologie, Senckenberg Centre for Human Evolution and Palaeoenvironment
install.packages("swirl") # Install the swirl package
library("swirl") # Load the swirl package in the R environment
swirl() # Start swirl
You are prompted to install a course, and a list of 5 options is offered. Select option '5. Don't install anything for me. I'll do it myself'. A GitHub page will open and the swirl session will close. We want to install the R_Programming_E course. Back to RStudio, type in the console
install_course_github("swirldev", "R_Programming_E")
swirl() # Start again swirl
You should now see listed the course '1: R Programming E'
Would you like to inform someone about your successful completion of this lesson via email? Select '1. Yes', type in your full name and my email address domenico.giusti@uni-tuebingen.de. Your email client should open. Send that email.
Your operating system and RStudio provide simpler graphic tools for these sorts of tasks, but having the ability to manipulate files programatically is useful in reproducible research. Also useful when working with dozens, thousands or milions of files.
If you have to repeat the same tasks more than [individual threshold], code them once!
Complete swirl modules '3: Sequences of Numbers' and '4: Vectors' and submit your successful completion via email (domenico.giusti@uni-tuebingen.de)
swirl modules won't count towards your final grade but are highly recommended to follow.
Managing your projects in a reproducible fashion doesn't just make your science reproducible, it makes your life easier.
@vsbuffalo
This image (and the previous one) was created by Scriberia for The Turing Way community and is used under a CC-BY licence
We’re going to create a new project in RStudio:
An .Rproj
file is created in your project directory. All your data, plots and scripts will now be relative to the project directory.
Put each project in its own directory, which is named after the project.
/data
:
/figures
:
/src
, /R
:
/text
, /doc
:
README
file
Name all files to reflect their content or function.
For this exercise we are going to use the archdata. "The archdata package provides several types of data that are typically used in archaeological research. It provides all of the data sets used in Quantitative Methods in Archaeology Using R" by David L Carlson."
You can get the whole collection of datasets by installing the package 'archdata'
install.packages("archdata")
The collection is released under the GNU GENERAL PUBLIC LICENSE GPL3. Read the package reference manual.
Create a new .Rmd file in your RStudio 'test_rr' project.
# load the Acheulean dataset
install.packages("archdata") # install the package
library(archdata) # load the package
data(package="archdata") # show the data sets in package ‘archdata’
data("Acheulean") # load the Acheulean data set in the R Environment
?Acheulean
# create a /data directory
getwd() # get working directory
list.files() # list files in the wd
dir.create("data") # create a /data directory
# download the data from Dropbox
url <- "https://www.dropbox.com/s/zpf2062jrdbhj2s/Acheulean.csv?raw=1"
# NOTE: add a raw=1 URL parameter or you will get a HTML preview page,
# not the file content itself
destfile <- "data/Acheulean.csv" # specify destination where file should be saved
download.file(url, destfile) # download the CSV file
# read the CSV file
Acheulean <- read.csv(file = "data/Acheulean.csv",
header = TRUE, sep = ",", dec = ".", row.names = 1)
# There are functions to read as well Excel files.
First look at the dataset
View(Acheulean)
str(Acheulean) # data object stucture
summary(Acheulean) # summary
"80% of data analysis is spent on the cleaning and preparing data. And it’s not just a first step, but it must be repeated many times over the course of analysis as new problems come to light or new data is collected." Tidy Data
Data structure:
"Real datasets can, and often do, violate the three precepts of tidy data in almost every way imaginable. While occasionally you do get a dataset that you can start analysing immediately, this is the exception, not the rule." Tidy Data
data(Acheulean)
# Compute percentages for each assemblage
Acheulean.pct <- prop.table(as.matrix(Acheulean[,3:14]), 1)*100
round(Acheulean.pct, 2)
plot(OST~HA, Acheulean.pct)
boxplot(Acheulean.pct)