Open Science

Basic principles and best practices

Dr. Domenico Giusti
Paläoanthropologie, Senckenberg Centre for Human Evolution and Palaeoenvironment

Week 1: Open Concepts and Principles

Outline

  • Introduction to some of the best tools for doing Reproducible Research
    • R & RStudio
    • (R)Markdown
    • GitHub
    • Zenodo, OSF

Reproducible research

The Open Scientist's Toolbox

Introduction to R & RStudio

Setup

If you don't have it already, now might be a good time to install R and RStudio to get started.

Download and install the R base system and RStudio. Both are needed. Installing RStudio will not automatically install R.

R

  • Programming language
  • Started as a statistics and data analysis environment
  • But can also build websites, run simulations, and lots of other things
  • R is what runs all of the code we will write this semester
  • Separate from RStudio

RStudio

  • IDE - Integrated Development Environment
  • Makes developing code in R easier
  • It includes a number of different aspects of code development in one place
    • Console - where R is actually running. Can work in here “interactively”
    • Text editor - writing dynamic documents (text ~ code) .R .Rmd .txt ...
    • Environment - provides information on the variables that currently exist and their values
    • History - history of the commands you’ve run
    • Project management - create, delete, and rename files & folders
    • Plots - graphic device

RStudio

R packages

R packages

R and RStudio have functionality for managing packages.

From the console, you can install packages by typing:

install.packages("packagename")

You can make a package available for use with:

library("packagename")

Packages can also be confortably viewed, loaded, and detached in the Packages tab of RStudio.

Assignments

"The swirl R package makes it fun and easy to learn R programming and data science. If you are new to R, have no fear."

# this is a comment
install.packages("swirl") # install the package
library("swirl") # load the package
swirl() # start swirl

The first time you start swirl, you'll be prompted to install a course. Select 5: Don't install anything for me. I'll do it myself. A GitHub collection of interactive courses will open. Back to the console, type:

install_course_github("swirldev", "R_Programming_E")

Complete swirl modules 1 and 2 (submit to domenico.giusti@uni-tuebingen.de)

Introduction to Markdown & RMarkdown

In the beginning was Literate programming

YEAR Event
1992 "Literate Programming" is introduced by Donald Knuth as "that (which) combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably more fun to write than programs that are written only in a high-level language. The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer."
2002 Friedrich Leisch introduces SWEAVE, a program for "Dynamic generation of statistical reports using literate data analysis"
2004 John Gruber creates the Markdown language in collaboration with Aaron Swartz. Their goal was to "write using an easy-to-read, easy-to-write plain text format, and optionally convert it to structurally valid XHTML (or HTML)"
2012 knitr R package released - knitr was inspired by SWEAVE
2014 rmarkdown R package released - extends Markdown to work with R/RStudio environment

rmarkdown

  • rmarkdown extends the Markdown language originally intended to write documents for the Web (i.e. HTML).

  • rmarkdown leverages Pandoc to convert between formats: from HTML (readable by web browsers) to DOC (such as from Microsoft Word or Google Docs) to ODT (Libre Office) to PDF (portable document format) to others like EPUB (e-books), HTML5 slide shows (slidy, ioslides), and TeX based documents and slides (Beamer).

rmarkdown

"The rmarkdown package helps you create dynamic analysis documents that combine code, rendered output (such as figures), and prose. You bring your data, code, and ideas, and R Markdown renders your content into a polished document that can be used to:

  • Do data science interactively within the RStudio IDE,
  • Reproduce your analyses,
  • Collaborate and share code with others, and
  • Communicate your results with others.

R Markdown documents can be rendered to many output formats including HTML documents, PDFs, Word files, slideshows, and more, allowing you to focus on the content while R Markdown takes care of your presentation".

rmarkdown

How rmarkdown works

rmarkdown installation

"The easiest way to install the rmarkdown package is from within the RStudio IDE, but you don’t need to explicitly install it or load it, as RStudio automatically does both when needed. A recent version of Pandoc (>= 1.12.3) is also required; RStudio also automatically includes this too so you do not need to download Pandoc if you plan to use rmarkdown from the RStudio IDE."

rmarkdown

Markdown syntax

# Heading level 1

Heading level 1

## Heading level 2

Heading level 2

### Heading level 3

Heading level 3

#### Heading level 4

Heading level 4

The Markdown Guide

**Bold text**

Bold text

_Italic text_

Italic text

1. First item
2. Second item
3. Third item
  1. First item
  2. Second item
  3. Third item

Introduction to Git & GitHub

Version control system

Assignments

Introduction to Zenodo

Zenodo is a project funded by CERN, Open AIRE and the EU (Horizon 2020).

"The OpenAIRE project, in the vanguard of the open access and open data movements in Europe was commissioned by the EC to support their nascent Open Data policy by providing a catch-all repository for EC funded research. CERN, an OpenAIRE partner and pioneer in open source, open access and open data, provided this capability and Zenodo was launched in May 2013." Zenodo

Assignments

  • Read the Zenodo principles (possibly after the 2nd lecture)
  • Sign up on Zenodo (with GitHub)

Introduction to COS & OSF

Cheatsheets