Author: Bailey DeBarmoreLearning about a method in class, like inverse probability weighting, is different than implementing it in practice. This post will remind you why we might be interested in propensity scores to control for confounding - specifically inverse probability of treatment weights and SMR - and then show how to do so in SAS and Stata. If you have corresponding code in R that you'd like to add to this post, please contact me.A note about weighting versus multivariable regression:Effect estimate interpretations when you use weighting are marginal effect in the target population. When you adjust for covariates in a regression model, you are interpreting a conditional effect, that is, the effect of the exposure holding (conditional on) the covariates being constant. Conditional estimates are troublesome with time-varying covariates because we run into collider bias and conditioning on mediators, thus weights are preferable. In simpler situations, using weights over multivariable regression can help with convergence issues . Files to Download: .txt file with SAS and Stata code, as well as a PDF version of this post with code (perfect for students) available to download at the end of the post or at my github |

UnstabilizedCreate a pseudopopulation 2x the size of our observed - one where everyone is exposed and one where everyone is unexposed. | StabilizedCreate a pseudopopulation maintaining the original population size, but we adjust the covariate distribution within each strata of exposure group by upweighting and downweighting people to match the overall covariate distribution. |

## coding IPTW in SAS

******************************************

* Calculating IPTW

*****************************************;

&let data=<data>;

&let y=<outcome>;

&let x=<exposure>;

&let id=<id>;

*Estimate denominator - output a dataset with results of regression called denom, with the resulting probabilities stored in variable d;

proc logistic data=&data desc;

model &x = <covariates>;

output out=denom p=d;

run;

*Generate numerator for stabilized weights - output a dataset with results of regression called num, with the resulting probabilities stored in variable n - note that there is nothing on the right side of the equation because the numerator will simply be P(A=a), where a = observed exposure status;

proc logistic data=&data desc;

model &x=;

output out=num p=n;

run;

*Generate stabilized and unstabilized weights by merging the datasets with regression output (merge on the unique identifier in your dataset, &id);

data <newdata>;

merge &data denom num;

by &id;

if &x=1 then do;

uw = 1/d;

sw = n/d;

end;

*Remember we can use 1 - P(exposed) for the unexposed weight components;

else if &x=0 then do;

uw=1/(1-d);

sw=(1-n)/(1-d);

end;

run;

*Check the distribution of your IPTW - the mean should be 1. Is the sum for uw twice the sum of sw? why? is the range of uw greater than sw? why?;

proc means data=<newdata> mean sum min max;

var uw sw;

run;

*You can check to see if your exposure and covariates are associated in your new pseudopopulation (<newdata>);

proc logistic data=<newdata> desc;

weight sw;

model &x=<covariates>;

run;

*Now you can run your main analyses and apply the weights using the weight statement - use sw variable for stabilized weights, and use uw for unstabilized weights - you can use proc genmod, glm, logistic, etc. I'll show you below with logistic you can see now we're using &y and &x - and we don't need the covariates because the confounder -> x arrow is encompassed in the sw weight statement;

proc logistic data=<newdata> desc;

weight sw;

model &y = &x;

run;

## coding IPTW in Stata

Since using IPTW results in the average treatment effect overall, we'll use option

**ate**(average treatment effect) instead of option

**atet**(average treatment effect on the treated) that we use for SMR.

******************************************

* Calculating IPTW

*****************************************;

* Syntax for teffects statement

*teffects ipw (<outcome>) (<exposure> <covariates>), ate

*where <outcome> is your outcome variable, <exposure> is your exposure variable, and <covariates> is a list of your covariates to generate your weights.

*Example: Binary

*Outcome = lowbirthwt

*Exposure = maternalsmoke

*Covariates = maternalage nonwhite

*Use the teffects statement to generate your weights and then apply them in a logistic (default) model all in 1 step

teffects ipw (lowbirthwt) (maternalsmoke maternalage nonwhite), ate

*If your outcome is continuous, you can specify a probit model

*Example: Continuous

*Outcome = birthwt

*Exposure = maternalsmoke

*Covariates = maternalage nonwhite

teffects ipw (birthwt) (maternalsmoke maternalage nonwhite, probit), ate

* Syntax to manually create IPTW for binary exposure (treatment)

logistic treatment vars

predict p

gen iptw = 1/p if treatment==1

replace iptw=1/(1-p) if treatment==0

*To calculate stabilized IPTW, run tab treatment and use the proportions for X below

tab treatment

gen siptw = X/p if treatment==1

replace siptw = (1-X)/(1-p) if treatment==0

If your treatment (exposure) is

**categorical**you'll need to change things up a bit. I'm writing this blurb in response to a comment below, and in response to this topic coming up several times in just the past week!

In SAS, you can use code like the sample below (thanks, Paul!) which uses a generalized logit model for a treatment variable <treatment> with multiple categories.

The output data set will have probabilities for every value of treatment. For example, a person with treatment=1 will have values for treatment=0,1,2,3 so you need to do some more coding to tell the computer you only want the predicted value for treatment if treatment=treatment.

PROC LOGISTIC DATA=...;

MODEL treatment = vars / LINK=glogit;

OUTPUT OUT=denom_ipw P=d;

RUN;

mlogit treatment vars

predict p0, outcome(0)

predict p1, outcome(1)

predict p2, outcome(2)

predict p3, outcome(3)

gen iptw=.

replace iptw=1/p0 if treatment==0

replace iptw=1/p1 if treatment==1

replace iptw=1/p2 if treatment==2

replace iptw=1/p3 if treatment==3

gen siptw=.

replace siptw=0.6/p0 if treatment==0

replace siptw=0.14/p1 if treatment==1

replace siptw=0.2/p2 if treatment==2

replace siptw=0.06/p3 if treatment==3

**my GitHub**

Calculating SMR and IPW - SAS - 2021 - EPICODE | |

File Size: | 8 kb |

File Type: | txt |

Calculating SMR and IPW in Stata - 2022 | |

File Size: | 6 kb |

File Type: | txt |

Coding IPW and SMR in SAS and Stata - PDF for teachers | |

File Size: | 551 kb |

File Type: |

## Suggested citation

**About the author**

Bailey DeBarmore is a doctoral student at the University of North Carolina at Chapel Hill studying epidemiology. Find her on Twitter @BaileyDeBarmore and blogging for the American Heart Association on the Early Career Voice blog. |

Hi Bailey. Really liked your post! I have a quick question: For IPTW, I wonder if the SAS codes can also be accommodated for an exposure with more than two groups. Say if x (exposure) can be 1, 2, or 3, then should we specify the stabilizing weight to be sw=(1-n)/(1-d) for x=1 (the referent group) and sw = n/d for x=2 and x=3 (the rest two groups)?

Hi Ken,

If your exposure=0,1,2,3 then you need to calculate the probability for each of these categories conditional on your covariates.

Use a generalized logit model (proc logistic with /link=glogit in the model statement) to do that. Your output dataset will have the predicted probabilities for exposure=0,1,2,3. For person i, if they have exposure = 2, then use Pr(Exposure=2|covariates) as their weight denominator. You would throw away the pred probabilities for that person where exposure=0, 1, 3. Once you have the predicted probabilities extracted based on their predicted values and observed exposure, divide 1 by the predicted value and fit your model with all treatment categories as independent variables. Shoutout to Paul Z for helping me answer your question.

-Bailey

Thanks Bailey and Paul for helping me on this! I found a SAS document that seems to suggest a similar solution to what you described: https://support.sas.com/resources/papers/proceedings/proceedings/forum2007/184-2007.pdf

However, it seems the weights created here are unstabilized weight? Thanks again for pointing me to the right direction.

Hi Bailey,

Thank you for this detailed post!

I had a follow-up question about IPTW in STATA. You use the teffects ipw command which estimates the iptw and applies it to the outcome model all in one step. I was wondering if you'd be able to specify how to create iptw manually in Stata without using the teffects command. I ask because I'm using survey weighted data and my outcome model uses the svyset procedure which is incompatible with teffects.

Great question, Ann! I've added some code to the Stata portion that specifies how to do this. Hope that helps!

Thank you I just saw that! I did have a follow-up question, when you mention stabilizing the iptw, is it the same as normalizing the iptw? Performed as follows:

egen sum_weights = total(iptwt)

gen norm_weights = iptwt/sum_weights

Hi Ann,

They're not quite the same. See above in the post where it talks at unstabilized versus stabilized weights. Stabilized weights have the probability of the exposure in the numerator.

Thanks for the information. It's really useful.I was wondering if the process is the same for IPW for attrition?

Thanks

Hi Diego, thanks for the comment. The process would be the same or similar for IPW of attrition, you would replace your predictors of treatment with predictors of attrition. - Bailey

Baily:

So glad I found your posting when searching the web to better understand IPTW. I am working in Stata, and would like to reproduce the output from teffects using a two step process that is required in SAS. You detailed above how to generate IPTW (and SIPTW). What would the code be to take that value and generate the same output as teffects?

Thanks!

PS: Typo in downloadable Stata code - header states SAS, should be Stata

Hi Joseph, thanks for the comment (and the heads up about the typo - fixed now).

If you haven't already, I would compare the results from Stata's teffects to what you find in SAS to see if they differ. If they do, and you want to do them by hand, then I would recommend following the code for categorical exposure which uses mlogit, but instead use logistic and replace your numbers just for the two categories that you have.

-Bailey

Hi Bailey, this is an amazing post about IPTW and just what I have been looking for.

I am a STATA user and want to know if there is a way to graphically show the level of balance in characteristics between say Drug A and Drug B before and after IPTW.

You may find the command "tebalance" helpful. This article discusses it (scroll down until you see the balance plots).

https://www.stata.com/stata14/treatment-effects-balance/

Hi Bailey,

Thank you for this post. I already used your SAS code and have a question regarding the distribution of uw and sw.

The following means I got: uw=1.997 and sw=0.999. The range of uw is greater than sw.

You wrote in the code: " ... the mean should be 1. Is the sum for uw twice the sum of sw? why? is the range of uw greater than sw? why?"

Are there some reasons for the distribution of uw/sw I got?

I have data where the unexposed group is much larger than the exposed group, so the SMR weights are very small for the unexposed group. Is it valid to normalize the SMR weights for the unexposed group to have a mean of 1 (i.e. divide the SMR weights by the mean for the group). That makes the sum of all the weights similar to the total study population, but does it still mean that I am estimating ATT?

## Leave a Reply.

Practical solutions for conducting great epidemiology methods. Transparency in code. Attitude of constant improvement.

All

Bailey DeBarmore

Data Visualization

Excel

IPW

Paul Zivich

P Values

Python

R

Regression

SAS

Stata

ZEpid

March 2021

September 2020

April 2019

September 2018

August 2018

July 2018

June 2018

May 2018