P for Trend in Stata and SAS

P for trend

5/22/2018

Author: Bailey DeBarmore

While I'm not a big fan of p values, sometimes your coauthors, reviewers, or editors ask for them. In this post I'll show you how to calculate p for trend for ordered categories, like in a Table 1, and for adjusted odds ratios or similar regression.

R users: I don't use R much, but encourage you to search for "prop.trend.test" to learn more about trend tests in R.

Jump to:
Stata
SAS

Stata

Test for Trend using nptrend

If you want to compare mean values across ordered categories, call the nptrend test after tab (for categorical) or tabstat (for continuous). It is an extension of the Wilcoxon Rank Sum test.

Binary and Ordinal Example

tab diabetes agegrp, col
nptrend diabetes, by(agegrp)

You can stratify, too.

sort male
by male: tab diabetes agegrp, col
nptrend diabetes if male==0, by(agegrp)
nptrend diabetes if male==1, by (agegrp)

This code produces proportions of diabetes by age group and then tests for trend by age group. The second block of code does this tabulation and trend test separately for males and females.

Note the difference in tab versus nptrend in the by group syntax: in tab, agegrp is included before the comma with no additional words, but in nptrend (and later in tabstat) you include it after the comma with a by().

Continuous and Ordinal Example

tabstat bmi, by(agegrp) stats(mean sd) format(%9.2f)
nptrend bmi, by(agegrp)

This code produces mean and sd of BMI by age group to 2 decimal places and then produces a test for trend by age group. Note that the syntax of the by grouping is similar in both tabstat and agegrp.

The default when you omit the stats option is to only give you the mean.

Other statistics you can request are mean, count, n, sum, max, min, range, sd, variance, cv, semean, skewness, kurtosis, p1, p5, p10, p25, median, p50, p75, p90, p95, p99, iqr, q.

Note that p50 is the same as median, and q is the same as writing p25 p50 and p75.

Count is the count of nonmissing observations and is the same as n.

CV is the coefficient of variation (sd/mean) and semean is the se of the mean (sd/sq rt n).

You can stratify here, too.

sort male
by male: tabstat bmi, by(agegrp) stats(mean sd) format(%9.2f)
nptrend bmi if male==0, by(agegrp)
nptrend bmi if male==1, by(agegrp)

ADJUSTED ESTIMATES: Test for Trend using Post-Estimation

After you conduct a regression with a categorical variable, you can test for trend using the post-estimation CONTRAST command.

You will want to indicate your categorical variable using the i. prefix in your regression statement. Then, when you call on CONTRAST (immediately after the regression) you can use a prefix for that variable that indicates the type of trend you want to look at.

Let's look at BMI (continuous) and age group in a linear regression (or ANOVA in this case).

anova bmi agegrp race

regress bmi i.agegrp race

You can run these contrast statements (and others)

1. Difference from reference level

contrast r.agegrp

2. Difference from next level

contrast a.agegrp

3. Difference from previous level

contrast ar.agegrp

4. Looking p-for-trend for linear, quadratic, cubic, quartic, and joint

contrast p.agegrp, noeffects

Using the p. prefix is only meaningful if you have ordinal categories.

If you're using a non-linear model, you can use the same contrast post-estimation statements after your regression, such as:

logit diabetes i.agegrp race bmi
contrast p.agegrp, noeffects

logistic diabetes i.agegrp race bmi
contrast p.agegrp, noeffects

SAS

Test for Trend using PROC FREQ: Binary and Ordinal

If you have a binary variable and a ordinal variable, you can use PROC FREQ to generate your trend test using the Cochran-Armitage test in the TABLES statement. It will test for trend across the column variable.

Just a refresher for which is the row and which is the column variable.

PROC FREQ data=[data];
TABLES row * col / trend;
run;

You may also want to request confidence limits (CL) and measures (MEASURES) with your trend test.

You can get the same results as the Stata nptrend by specifying SCORES=MIDRIDIT in the TABLES statement, after the / .

PROC FREQ data=stroke;
TABLES diabetes * agegrp / trend;
run;

This code will give you a test for trend of diabetes frequency across age groups. The output you're looking for is titled "Cochran-Armitage Trend Test". The one-sided p-value is for a test of trend in a pre-defined direction. The two-sided p-value is for a test of trend when you don't know what direction to expect. (I'm partial to two-sided p-values).

A small p-value means you can reject the null hypothesis of NO TREND.

Test for Trend using PROC NPAR1WAY: Continuous and Ordinal

If you want to test for trend with a continuous variable across ordinal categories, you can use PROC NPAR1WAY and request the Wilcoxon Rank Sum test.

PROC NPAR1WAY data=stroke WILCOXON;
CLASS agegrp;
VAR bmi;
*exact wilcoxon;
run;

This code would compute p for trend of BMI as a continuous variable across age groups. Note that if you have a small sample size that likely does not meet the normal distribution assumptions, you should include the "exact wilcoxon" statement.

In the output, look for the Normal Approximation two-sided p-value, where a small p-value let's you reject the null hypothesis of no trend. If you used the exact option, look for the two-sided p-value under Exact Test.

ADJUSTED ESTIMATES: Test for Trend

In the output from PROC LOGISTIC, the "Testing Global Null Hypothesis: BETA=0" is equivalent to the Cochran-Armitage test used in PROC FREQ, but for your adjusted odds ratios.

You can also ask for separate Wald tests of the betas by using the TEST statement.

PROC LOGISTIC data=[data];
MODEL diabetes = agegrp bmi race;
TEST agegrp;
run;

Hopefully you found this post helpful in understanding exactly what your output is giving you. I know I learned a lot just by researching it for you.

Bailey

About the Author

Bailey DeBarmore is a doctoral student at the University of North Carolina at Chapel Hill studying epidemiology. Find her on Twitter @BaileyDeBarmore and blogging for the American Heart Association on the Early Career Voice blog.

13 Comments

Melkamu Merid

10/19/2019 11:47:50 am

thank you very much, it helps!

Lindsey Adelle Wood

4/23/2020 10:36:34 am

Hi Bailey. I have a question. I am using SAS and am trying to test the trend of ORs for a categorical variable. In my test statement, I have listed all the categories separated by commas and = 0 (cat1, cat2, cat3, cat4=0). Will this give me the p-value for a linear trend of the ORs of the categories? I can't seem to find a clear answer in SAS documentation.

Bailey

4/23/2020 03:00:25 pm

Hi Lindsey, great question.

Using the TEST statement in SAS will indeed test a linear hypotheses. The specifications are just what type of test you want to use to test that linear trend, such as Wald (default). You can use the option "all" after your test statement to show results for all the tests (see more here: https://bit.ly/2VUaB2q). So in your example, you'll be testing for a linear relationship between the different categories of your variable. Note that by using different categories, or categorical coding, you are already assuming that the differences between your categories are the same.

Bailey

Rilla

7/20/2020 05:33:36 pm

Hi Bailey,

I've been trying to use -nptrend- to show the presence/non-presence of a trend between 7 ordered categories and a continuous variable. The sample is about 50,000, and is pretty unbalanced - i.e. around 50% of the sample is in category 1, 15% in category 2, intermediate proportions in categories 3-6, and 2% in category 7.

Box plots show me that there is an upward trend - i.e. the higher the value of the continuous variable with increasing category. When I apply -nptrend- the test appears significant,even when I alter the order of the categories...

My feeling is that, given -nptrend- is an extension of the Wilcoxon, it could be sensitive to differences in the distribution of the continuous variable within each category. Do you know if there are sample size limits to -nptrend-?

7/21/2020 11:35:29 am

Hi Rilla, thanks for posting.

I don't have a great answer to your question about distribution within groups or sample size, but I can tell you this - nptrend is testing the mean values in your groups, and changing the order won't change the results (see the example here https://www.stata.com/manuals13/rnptrend.pdf), Even if your boxplots look like an increasing trend, but the means are all within the confidence intervals of another, then you may not see a trend. You may want to look into other trend tests since your categories are prespecified rather than percentiles to make sure you're using the most robust test.

Best,
Bailey

7/21/2020 03:29:20 pm

Thanks for the information, Bailey.

Alecia James

10/25/2020 12:27:30 pm

Hello Bailey,

Thank you for your post. This is definitely the best guide I've seen so far for what I'm trying to do. I'm trying to do a linear test for trend using logistic regression. When working with unadjusted odds ratio, I found a guide that explained that you can use the Score test in the "Testing Global Null Hypothesis Table" as that is equivalent to the Cochran-Armitage Trend Test.

My question is regarding what you stated below:

" but for your adjusted odds ratios.

You can also ask for separate Wald tests for linear trend of the betas by using the TEST statement.
PROC LOGISTIC data=[data];
MODEL diabetes = agegrp bmi race;
TEST agegrp;
run; "

I'm doing a multivariate logistic regression, and I would like to obtain the p-value for trend for my main predictor variable, while controlling for other variables. I understand the syntax you have shared above as well.

My question is, does the Wald score give a test for linear trend? I was under the impression that it is the Score test that gave the trend (based on what I read prior).

Secondly, when entering the variables in the model, do I have to enter the main predictor variable as a continuous variable (when doing the test for trend), or can it be entered as a categorical variable? My goal really is to test if there is a trend for the ORs obtained for different categories of the main predictor variable.

Thank you

10/26/2020 01:17:11 pm

Hi Alecia! Thanks for taking the time to comment. You're completely right - the Wald test is a global test of a null hypothesis and is distribution-free. I have edited the post to anything about linear trends for Wald test.

For your second question, you can use a categorical. In fact, in the example, the 'agegrp' variable stands for age group.

Bailey

PS: multivariate refers to regression with multiple outcomes, while multivariable refers to regression with additional independent variables.

May

11/7/2020 09:39:25 am

Excuse me, I am a graduate school student, and beginner of statistics. I'm very sorry to ask a question about p for trend because of my homework. Is it meaningful and possible to calculate p for trend in multiple logistic regression?

Kexin

12/7/2020 03:56:22 pm

Hi Bailey,

Thank you so much for the information!
I have a question for p for trend using SAS. In Proc Logistic, I have a regression involving an interaction term.

For example:

Model diabetes = sleep gender sleep*gender

(sleep is a 3 level ordinal categorical variable which is the main predictor, diabetes is Y/N, and gender is boys/girls)

I'm wondering if you know how to use Test statement to obtain the p for trend for ORs across sleep categories, separately for boys and girls? I believe the added complexity would be the interaction term, so I'm not sure if there is a way to test for trend. If Test statement cannot achieve this, do you recommend any other methods?

Thanks in advance!
Kexin

12/7/2020 04:34:39 pm

Hi Kexin,

You should be able to use testparm to test that the combination of sleep gender and sleep*gender are significant together.

Bailey

Jiali

8/16/2021 11:09:23 am

Really helpful~

Jane

6/21/2023 08:46:15 am

Hi Bailey,

Thanks for your blog, it is super helpful!

Just got a question at the end about the "Testing Global Null Hypothesis: BETA=0". Does this really give a test for trend across actegorical exposure? Based on my previous knowledge, this only provides evidence whether we should against the intercept-only model (in favor of the current model)?

Thanks.
Jane

Your comment will be posted after it is approved.

P for trend

Stata

Test for Trend using nptrend

Binary and Ordinal Example

Continuous and Ordinal Example

SAS

Test for Trend using PROC FREQ: Binary and Ordinal

Test for Trend using PROC NPAR1WAY: Continuous and Ordinal

Leave a Reply.

BLOGS

About

RD EXAM

FIND ME ON