library(nlme)
library("dygraphs")

Berlin School of Economics and Law

BSEL

Data Science Activities

BIPM at BSEL

Outline

Why R is the best data science language to learn today

R consistently ranks among the best languages

  • IEEE: R ranks #5
  • O’Reilly: R is arguably the most common data programming language
  • Redmonk: R is #12
  • TIOBE: R ranks high with consistent upward trend

consistent upward trend

R is excellent for learning data science

R is a true “data language”

R is a language that has statistics and data built into its DNA, so to speak.

In this sense, R is nearly unique among programming languages. It is a language that has been built for statistics. It’s been designed for data.

This has advantages when you’re learning data science, because almost any statistical test or technique can be found somewhere within base R or one of its packages.

The best books and resources use R

This is important. If you’re a beginner, and you’re just getting started in data science, you’ll have a lot to learn. To truly master data science, you’ll need to learn several sub-areas like probability, statistics, data visualization, data manipulation, and machine learning. All of these skill areas have theoretical foundations (which you’ll need to learn) but also practical techniques that you’ll need to execute by writing code.

The best books and resources use R

Strong in Academics AND Industry

R is in heavy use at several of the best companies who are hiring data scientists.

As Revolution Analytics recently noted, “R is also the tool of choice for data scientists at Microsoft, who apply machine learning to data from Bing, Azure, Office, and the Sales, Marketing and Finance departments.”

Beyond tech giants like Google, Facebook, and Microsoft, R is widely in use at a wide range of companies including Bank of America, Ford, TechCrunch, Uber, and Trulia.

Media Exposure

Community

Reproducible Research

“The term reproducible research refers to the idea that the ultimate product of academic research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research”