BAILEY DEBARMORE
  • Home
  • Productivity
    • Blog
    • My Recs
  • EPI
    • EPICODE
    • #EpiWritingChallenge >
      • About the EWC
      • All Posts
  • Wellness
    • Health Blog
    • My Recs
  • Freebies

Interpreting Multinomial Logistic Regression in Stata

4/14/2019

7 Comments

 
Author: Bailey DeBarmore
You may find yourself running a multinomial logistic regression, but unsure how to interpret your output. I get these questions alot from students, so I'm here to help demystify your Stata results. 

Running the regression

To run a multinomial logistic regression, you'll use the command -mlogit-. 

You can see the code below that the syntax for the command is mlogit, followed by the outcome variable and your covariates, then a comma, and then base(#).

​In this example I have a 4-level variable, hypertension (htn). I want the reference category, or the base outcome, to be normal BP, which corresponds to htn=0. So I'll use base(0) in my code. 
​*Analysis question: 
*Estimate the association of diabetes with hypertension stage
*Adjusting for sex and age
*SYNTAX: mlogit <outcome> <covariates>, base(#)

*the option base(#) allows you tell Stata what level of the outcome variable
*should be the reference - for us we'll make it htn = 0 (normal)
set more off
set line 200

*Run the model

mlogit htn diabetes female age, base(0) 
Interpreting output from Stata mlogit command | EPICODE
Figure 1. Stata output from running an mlogit command with a 4-level hypertension outcome, with diabetes, female sex, and age (yrs) as covariates. Each box with repeated covariates corresponds to a level of the outcome compared to the reference (which we indicated in base(#)).
Above is the Stata output from running the mlogit command. 

You can see that there is a box at the top for htn=0, because we set that as the base outcome. If we had set the base outcome to be htn=2, we would have covariate output for 0, 1, and 3, and where the 2 box is, would be a blank with (base outcome). 

Each box corresponds to the estimated log odds of that covariate for one outcome level versus the base outcome. You can see where htn=1, it's estimating P(Elevated BP) vs P(Normal BP) (on the natural log scale). 

The Equations

Let's connect this output with the regression equation. When I want to pull estimates, I often enter in the coefficients to an MS Excel spreadsheet, and knowing how the output translates to the equation is important.
Writing out the equation for multinomial regression | EPICODE
Figure 2. How do we bring our regression output back to the statistical equation? You can write out the equation for each otucome versus the reference, here HTN = 0 or normal BP, with beta (coefficient) subscripts that correspond to the level of the outcome.
Usually we write out the equation with just beta-0, beta-1, beta-2, etc. but since we have multiple levels of the outcome, each coefficient will be prefixed by X, which indicates the level of the variable (gray equation). I have it written out for each HTN level. 

Let's apply it in an example using elevated BP vs normal.
Example multinomial equation from mlogit in Stata | EPICODE
Figure 3. Here's an example calculation where we plug in the coefficients for HTN = 1, elevated BP, into the equation we wrote out in Figure 2.
Helpful Links
​

UCLA Institute for Digital Research and Education
  • Annotated Output for Multinomial Logistic Regression in Stata
  • Multinomial Logistic Regression in Stata Data Analysis Examples

Stata Documentation for mlogit

Exponentiate

But we are really interested in the exponentiated coefficients, or the relative risk ratio in this scenario. In other Stata regression, we can use the option "or" or "exp" to transform our coefficients into the ratio. With -mlogit-, you do something a bit different - you use the option rrr in a statement run right after your regression and Stata will transform the log odds into the relative probability ratios, or the relative risk ratio (RRR).

*Get exponentiated results;
mlogit htn diabetes female age, base(0) 
mlogit,rrr

*Is the effect of diabetes on Elevated BP vs normal BP similar for Stage 2 HTN vs normal BP?
test [1]diabetes = [3]diabetes
Exponentiated mlogit results in Stata | EPICODE
Figure 4. To get the relative risk ratio (RRR), we run "mlogit, rrr" after running our regression. In this figure you'll see interpretations of the elevated BP vs normal BP in the grey box, and a test for the effect of diabetes in 2 outcome levels (orange box).
The output format when we run -mlogit, rrr- is the same as before, but we have exponentiated betas.  If you use a calculator and exponentiate the betas in the original output you'll see they match up.  

I've interpreted the RRR for elevated BP vs normal BP in the grey box. 

You may be interested if the effect of one covariate is the same across levels of the outcome. For example, does the effect of diabetes differ when we look at elevated BP vs normal BP versus stage 2 hypertension vs normal BP? 

We can use the test command and indicate the level of the outcome in [ ] 's. 

By writing the test statement out that the values are equal to each other, we are testing the null hypothesis that they are equal, or that their difference is zero. The prob > chi2 gives us the probability of observing a more extreme chi2 value, and here our p-value of 0.16 indicates we won't be rejecting the null this time around -> the effect of diabetes in elevated BP vs normal BP versus stage 2 HTN vs normal BP is similar. 

Marginal Probabilities

Margins Postestimation Command in MLOGIT - Stata | EPICODE
Figure 5. You can estimate the predicted probability of diabetes at each level of the outcome, holding the other covariates at their means. Each box corresponds to an outcome level.
If you want to estimate the predicted probability of each outcome for those with adn without diabetes, you can use the margins command. You run the margins command for each level of outcome. Be sure that your factor variable of interest (diabetes in the example) is run in the regression as a factor variable (i.variable). There's usually no need to do this with binary outcomes, so you may not have. Just re-run your regression with i.variable (you can even do so 'quietly') and then run margins. Note that if you want to always run your covariates as factor variables (binary or categorical) you can do so. For a binary variable it will just give you 1.variable for a 0-1 variable, or you can tell Stata you want 1 to be the reference with ib1.variable. 

With the margins command you can set each covariate to a level, such as female=1 (the average sex part here doesn't mean much) or you can predict atmeans (which is useful for age). 

About the Author

Picture
Bailey DeBarmore is a doctoral student at the University of North Carolina at Chapel Hill studying epidemiology. Find her on Twitter @BaileyDeBarmore and blogging for the American Heart Association on the Early Career Voice blog. ​​​​
7 Comments
sai
9/12/2020 02:28:24 am

Nice post! This is a very nice blog that I will definitively come back to more times this year! Thanks for informative post.
<a href="https://360digitmg.com/india/artificial-intelligence-ai-and-deep-learning-in-guduvanchery">visit here</a>

Reply
Stephen
9/13/2020 05:35:38 pm

Hi Bailey,

Your post comes handy at the very point I am stuck eith my research. I have performed the MLR on Stata and gotten the results including the results for "mlogit, rrr". I am at the stage testing if the effect of one covariate is the same across levels of the outcome. I choose one of my covariate- DEBTTA to test but I am getting an error message from Stata "=exp required". Perphaps I am not using the right command. Please help

test [1] (DEBTTA) = [2] (DEBTTA)
=exp required

Reply
Bailey
9/13/2020 08:39:11 pm

The syntax for the test statement follows how you have your outcome coded. In the example, diabetes is 1, 2, or 3. If we had an outcome "marriage status" coded as married, divorced, separated (example) then that is what would be in the brackets.

test [divorced]status = [married]status

Since you are testing a covariate at different levels, double check that your syntax matches. it should be:
test [<outcomelevel>]#.DEBTTA = [<outcomelevel>]#.DEBTTA where <outcomelevel> refers to the 1 2 or 3 for the diabetes example or the married, divorced, separated for the marriage example and # is the level of the covariate DEBTTA.

You may find this webpage helpful: https://stats.idre.ucla.edu/stata/dae/multinomiallogistic-regression/

Bailey

Reply
Stephen
9/13/2020 08:50:44 pm

Thanks for the response.

I have four outcome levels and I am testing effect of DEBTTA on level 1 and 2, and then later maybe 2 and 3. 0 is the base outcome level

I have use the command as per your reply:

test [1]#. DEBTTA = [2]#. DEBTTA

I still get error from Stata:

invalid varname

Stephen
9/14/2020 11:27:52 pm

Hi,

Goodness! I eventually got the syntax right. Many thanks Bailey

test [ST]DEBTTA = [SST]DEBTTA

( 1) [ST]DEBTTA - [SST]DEBTTA = 0

chi2( 1) = 5.74
Prob > chi2 = 0.0166

This implies the Ho can be rejected on 95% confidence level and the effect of the covariate DEBTTA is different for ST and SST firms. hope I am right?
Secondly, if the outcome ST and SST are financial distress states. Does this also test this hypothesis:
- Ho: DEBTTA have no significant impact on financial distress states

Reply
Seid
8/7/2022 11:32:41 am

hi dear, when I run MNL the iteration says not concave ...and it cann't show me the result. SO, what shall I do?

Reply
Bailey
8/30/2022 10:45:35 am

A quick search found this forum: https://www.statalist.org/forums/forum/general-stata-discussion/general/1337409-not-concave-multinomial-logit

Please take a look and see if it applies to you. If you are using indicator variables, try using i. notation instead.

Another reason you may be getting that error is because you have too many categories in your outcome/dependent variable.

-Bailey

Reply

Your comment will be posted after it is approved.


Leave a Reply.

    Picture
    Picture

    Practical solutions for conducting great epidemiology methods. Transparency in code. Attitude of constant improvement.

    Appreciate my stuff?

    Picture

    Picture
    Picture
    Picture
    Picture

    Picture
    Picture

    All
    Bailey DeBarmore
    Data Visualization
    Excel
    IPW
    Paul Zivich
    P Values
    Python
    R
    Regression
    SAS
    Stata
    ZEpid


    Picture

    March 2021
    September 2020
    April 2019
    September 2018
    August 2018
    July 2018
    June 2018
    May 2018


    RSS Feed

BLOGS

Work & Productivity
Health and Nutrition
EPICODE

About

About Bailey
CV and Resume
CONTACT

RD EXAM

Study Smarter Method
RD Exam Resources

FIND ME ON

Facebook
LinkedIn
Twitter
Google Scholar
Research Gate
Terms & Conditions | Privacy Policy | Disclaimers
Copyright Bailey DeBarmore © 2020
  • Home
  • Productivity
    • Blog
    • My Recs
  • EPI
    • EPICODE
    • #EpiWritingChallenge >
      • About the EWC
      • All Posts
  • Wellness
    • Health Blog
    • My Recs
  • Freebies