You CAN use Excel to do Statistics, but should you?

When I was an undergraduate, I learned to distrust the statistical functions in Excel. Back then, we didn’t use software in statistics class, we had to learn how to do all the calculations with paper, pencil, and a basic four-function calculator. When we were taught to calculate quartiles, a classmate discovered that Excel gave the “wrong” answer for some of our homework problems. Our instructor just said, “Don’t use Excel, use the method I taught you.”

It would have been nice if he had told us, to quote Wikipedia, “…there is no universal agreement on selecting the quartile values.”[1] After all, this was the same professor who made us learn a half-dozen different post-hoc tests to use with ANOVA. But in his mind, there was only one way to calculate quartiles.

In my tutoring work, I’m increasingly finding students who are in classes where Excel is the default tool for statistical analysis. Most of these are in business schools, which makes sense. As managers, they will all have Excel on their desktops, and very few will have access to SAS or SPSS, or even R. I wondered, though, if Excel’s algorithms were up to the task.

Not surprisingly, I wasn’t the first one to consider this. In 1998 McCullough discussed methods for evaluating the statistical accuracy of software.[2] However, he only evaluated SAS, SPSS, and S-Plus. In 1999, McCullough partnered with Wilson to apply his methodolgy to Excel 93.[3] They concluded “Excel has been found inadequate…. We advise that Excel not be used for statistical calculations.”

By 2011, Microsoft had improved the statistical algorithms in Excel. Keeling and Pavur compared six different spreadsheet packages, along with SAS and R. They looked at two different versions of Excel and found that while there were serious problems with Excel 2007, they got good results with Excel 2010.[4]

My concerns about using Excel for statistical analysis were valid back in my undergraduate days. I’m happy to learn that the product has been improved, and Excel is now a useful tool for conducting statistical analysis. Given the ubiquity of Excel, that’s a very good thing.

[1]https://en.wikipedia.org/wiki/Quartile

[2]McCullough, B. (1998). Assessing the Reliability of Statistical Software: Part I. The American Statistician, 52(4), 358-366. doi:10.2307/2685442

[3] McCullough, B. D., & Wilson, B. (1999). On the accuracy of statistical procedures in microsoft excel 97. Computational Statistics and Data Analysis, 31(1), 27-37. doi:10.1016/S0167-9473(99)00004-3

[4] Keeling, K., & Pavur, R. (2011). Statistical Accuracy of Spreadsheet Software. The American Statistician, 65(4), 265-273. Retrieved from http://www.jstor.org.ezproxy.libproxy.db.erau.edu/stable/23339556

Avatar
Gerald Belton
Statistician, Adjunct Instructor

Related

comments powered by Disqus