Author: Bailey DeBarmore
In a previous post, I talked you through how to get your data clean in Excel before importing to Stata, SAS, or R [read that post here]. The next step is to import that data and start creating variables you need for analysis and running descriptives. I'll go through how to import your data in this post.
Feel free to post questions in the comments!
0 Comments
Author: Bailey DeBarmore A task I often help with is getting survey data prepped for data analysis. Typically a client has distributed a written survey via Qualtrics or SurveyMonkey, and has downloaded the survey results to Excel as an xls, xlsx, or csv.
Before importing that data into Stata, SAS, or R, [read that post here] there are a few steps you should do first. In this short tutorial post, I'll walk you through those steps. I highly recommend reading through this post in full before touching your data. Get an overview of what you'll need to do and then read through again to let the gears turn on how you'll need to clean your own data. Feel free to post any questions in the comments! Let's get started. Author: Bailey DeBarmore Do you want to model trajectories by calculating transition probabilities? You can do this in Stata with just a little extension of your longitudinal data analysis skills. We'll be using xttrans (built in to Stata) or xttrans2 (a module you can download). Your data will need to be in LONG format. To reshape from wide to long, use: reshape long stub, i(id) j(time) Where stub is the stubname of your variable, i is the variable that uniquely identifies observations in your data set, and j is the time variable.
Author: Bailey DeBarmore You may find yourself running a multinomial logistic regression, but unsure how to interpret your output. I get these questions alot from students, so I'm here to help demystify your Stata results. Running the regressionTo run a multinomial logistic regression, you'll use the command mlogit.
You can see the code below that the syntax for the command is mlogit, followed by the outcome variable and your covariates, then a comma, and then base(#). In this example I have a 4level variable, hypertension (htn). I want the reference category, or the base outcome, to be normal BP, which corresponds to htn=0. So I'll use base(0) in my code. Author: Bailey DeBarmore Learning about a method in class, like inverse probability weighting, is different than implementing it in practice.
This post will remind you why we might be interested in propensity scores to control for confounding  specifically inverse probability of treatment weights and SMR  and then show how to do so in SAS and Stata. If you have corresponding code in R that you'd like to add to this post, please contact me. A note about weighting versus multivariable regression: Effect estimate interpretations when you use weighting are marginal effect in the target population. When you adjust for covariates in a regression model, you are interpreting a conditional effect, that is, the effect of the exposure holding (conditional on) the covariates being constant. Conditional estimates are troublesome with timevarying covariates because we run into collider bias and conditioning on mediators, thus weights are preferable. In simpler situations, using weights over multivariable regression can help with convergence issues . Files to Download: .txt file with SAS and Stata code, as well as a PDF version of this post with code (perfect for students) available to download at the end of the post or at my github Trying to figure out how to model a categorical predictor in your regression?
Done this code a million times but can never remember the syntax for the class statement? Want to generate exponentiated estimates and confidence intervals? We'll give examples for binary, 3 levels, 4 levels, and stratified. Author: Paul Zivich Python is a general computer programming language but has recently garnered popularity among data scientists with its versatility, ability to quickly process large data sets, and large library of machine learning models. I taught myself Python two years ago and while there are several Python libraries for epidemiology, I found the libraries were no longer actively maintained, did not interact with pandas (the main data management Python library), or implement causal inference methods (like inverse probability weights). To fill this gap, I created zEpid with the goal of making epidemiologic analyses in Python ez.
Author: Bailey DeBarmore Short post today on how to use the MEAN function in SAS 9.4. Let's get started.
It seems like every time I need to calculate a mean variable in SAS, I find myself looking up which CALL functions deal with missing values in this way, and which in that way. For example, blood pressure readings are often taken 3 times, and then we average those 3 readings together for a mean value. In some code I ran earlier this morning, I kept getting negative values in my "avg_bp" variable. What's up with that?
Author: Bailey DeBarmore
While I'm not a big fan of p values, sometimes your coauthors, reviewers, or editors ask for them. In this post I'll show you how to calculate p for trend for ordered categories, like in a Table 1, and for adjusted odds ratios or similar regression.
R users: I don't use R much, but encourage you to search for "prop.trend.test" to learn more about trend tests in R.
Jump to: Stata SAS 
Practical solutions for conducting great epidemiology methods. Transparency in code. Attitude of constant improvement. Appreciate our stuff?
All
March 2021
