R is a very popular language for doing analytics, and particularly statistics, on your data. There are a number of R functions for reading in data, but most of them take a delimited text file (such as .CSV) for input. That's great if your existing data is in a spreadsheet, but if you have large amounts of data, it's probably stored in arelational database. If you work for a large company, chances are that it is an Oracle database.
This week, we return to our “Getting Started With R” series. Today we are going to look at some tools from the “dplyr” package. Hadley Wickham, the creator of dplyr, calls it “A Grammar of Data Manipulation.”
filter() Use filter() for subsetting data by rows. It takes logical expressions as inputs, and returns all rows of your data for which those expressions are true.
To demonstrate, let’s start by loading the tidyverse library (which includes dplyr), and we’ll also load the gapminder data.
Today we are going to digress from our ongoing “Intro to R” series, and talk about a subject that’s been on my mind lately: sample sizes.
An important question when designing an experiment is “How big a sample do I need?” A larger sample will give more accurate results, but at a cost. Use too small a sample, and you may get inconclusive results; too large a sample, and you’re wasting resources.
The data frame is the primary structure for working with data in R. Whenever you have data that is arranged in a spreadsheet-like fashion, the default receptacle for that data in R is the data frame. In a data frame, each column contains measurements on one variable, and each row contains measurements on one case. All of the data in a column must be of the same type (numeric, character, or logical).
This week, we are going to talk about using git and GitHub with RStudio to manage your projects.
Git is a version control system, originally designed to help software developers work together on big projects. Git works with a set of files, which it calls a “repository,” to manage changes in a controlled manner. Git also works with websites like GitHub, GitLab, and BitBucket, to provide a home for your git-based projects on the internet.
Last week, we installed R and R Studio, and we tried out a few simple R commands in the console. But using R in interactive mode, while powerful, has some limits. Today we are going to learn how to use R as a programming language, and we will write our first R Script. But first, let’s look at how we can use R Studio to keep our work organized.
Why learn R? There are a lot of tools available for doing data analytics, data science, or statistical analysis. So why should you choose R? I’ll answer that by contrasting R to some of my other favorite tools.
If you want to create data visualizations, Tableau is an amazing tool. With a few mouse clicks you can create anything from a bar chart to a heat map. The graphics it produces are beautiful and you don’t need to know any programming.