Statistical Software Used in Epidemiology
Created | Updated Jan 7, 2012
Epidemiology can be defined as...
The study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to the control of health problems. - Stedman's Concise Medical Dictionary
Back in the 'olden' days1, epidemiologic research was conducted without the benefit of computers and specialised software programs. From listening to those involved in such antiquated research, it sounds as if square kilometres of index cards must have been used in tracking subject data and entire notebooks consumed as individual equations were solved by hand. While some of these legends are likely tall tales of the 'I walked to school in the snow up hill both ways' variety, it is of no doubt that epidemiologists of yore were, because of their lack of computers, constrained by the complexity of their analyses.
Epidemiologists today use a wide variety of software programs. Some are fiercely loyal to one program or another, while other epidemiologists hop their data between programs depending on what function they need to use. In many cases, the software used by a given epidemiologist is more a function of where and when they went to graduate school than it is a matter of which software program is best suited to their needs. The programs below are presented in alphabetical order, as no preference is implied.
Access
Access isn't a statistical package, per se. Access is really a database programme. However, many epidemiologists use Access for their data collection needs, especially if they need to have data fed into a server from many sites. Access can make really pretty little tables and reports, but to do real statistical analyses, you need to transfer the data into another program. Stat Transfer is the easiest program to use for transferring data from Access to other programs.
EpiInfo
EpiInfo is a free software package, which is great. It can be downloaded from the Centers for Disease Control. It is DOS-based, however, which is not always so great. There are several different modules to EpiInfo, with one of the most handy being StatCalc. StatCalc allows you to calculate relative risks and odds ratios, along with their confidence intervals, and will also perform sample size and power calculations.
Excel
While Excel was designed to be a spreadsheet package rather than statistical software, many fields have actually found ways to use it for quite complex analyses. Between the use of formulas, macros, Visual Basic, and the built-in statistical functions, the depth of Excel actually exceeds what most people realise. However, while Excel can be ideal for some types of modelling or demographic analyses, most epidemiologists have found other statistical packages to be more suited to their needs. Even so, once the data is analyzed, it's often Excel that the epidemiologist will turn to for the production of tables and graphs. Excel is available on both Windows and Macintosh platforms.
SAS
SAS is perhaps the 'heaviest duty' statistical package that epidemiologists commonly deal with, and is the best suited for manipulating incredibly large, raw data sets. Some epidemiologists will deal with data management issues within SAS, and then transfer the dataset to another statistical format for analysis using Stat Transfer. SAS programming will seem simple to those who are familiar with Unix environments or any type of programming, but may be too much to those who are used to a point and click environment. SAS is available on both Windows and Unix platforms.
SPSS
SPSS strikes a good balance between a user-friendly interface and a breadth of statistical functions. While SPSS cannot correctly adjust for the more complicated correlated, longitudinal, and clustered datasets like Stata and Sudaan, it can handle the statistical needs of most users. Furthermore, its point and click interface is more intuitive than the command-line interface required by SAS, Stata, and Sudaan. SPSS is available on both Windows and Macintosh platforms.
Stata
Stata is probably the most user-friendly statistical software to use a command-line interface. It's capable of dealing with a wide variety of 'funky data' issues, such as clustering, unequally weighted data, and multiple inputation. Many statistical textbooks include examples from Stata or SPSS, both in the text and in attached disks or CD-ROMS. The graphs drawn by Stata are ugly enough that they're not really worth bothering with. Stata is available on Windows, Unix and Macintosh platforms.
Stat Transfer
The Stat Transfer program is produced by the makers of Stata, and allows a user to translate data between virtually all imaginable data forms. This is incredibly valuable since essentially every software program saves data in a different format and they can't use each others' formats. It's not unheard of for epidemiologists to get in the habit of using one software package for data management, another for descriptive analyses, another for regression analyses, and another for tables and graphs. In such cases it is vital to be able to quickly and easily translate between data formats.
Sudaan
Sudaan was especially designed to deal with funky data samples: samples that were cluster-correlated, that had unequal weights or multi-stage sample designs. Only a few years ago, data that had these complications needed Sudaan, and that meant a statistician who really understood computers. These days, however, more user-friendly statistical programs like Stata are also equipped to handle such complicated issues. On the other hand, using Sudaan does imply to statistical journals that you probably really know what you're doing.
Specialised Areas
Subspecialties within epidemiology, such as genetic epidemiology, have their own types of software - like SAGE, Linkage, or Epicenter.