P for trend
Author: Bailey DeBarmore
While I'm not a big fan of p values, sometimes your coauthors, reviewers, or editors ask for them. In this post I'll show you how to calculate p for trend for ordered categories, like in a Table 1, and for adjusted odds ratios or similar regression.
R users: I don't use R much, but encourage you to search for "prop.trend.test" to learn more about trend tests in R.
Test for Trend using nptrend
If you want to compare mean values across ordered categories, call the nptrend test after tab (for categorical) or tabstat (for continuous). It is an extension of the Wilcoxon Rank Sum test.
Binary and Ordinal Example
tab diabetes agegrp, col
You can stratify, too.
This code produces proportions of diabetes by age group and then tests for trend by age group. The second block of code does this tabulation and trend test separately for males and females.
Note the difference in tab versus nptrend in the by group syntax: in tab, agegrp is included before the comma with no additional words, but in nptrend (and later in tabstat) you include it after the comma with a by().
Continuous and Ordinal Example
tabstat bmi, by(agegrp) stats(mean sd) format(%9.2f)
This code produces mean and sd of BMI by age group to 2 decimal places and then produces a test for trend by age group. Note that the syntax of the by grouping is similar in both tabstat and agegrp.
The default when you omit the stats option is to only give you the mean.
Other statistics you can request are mean, count, n, sum, max, min, range, sd, variance, cv, semean, skewness, kurtosis, p1, p5, p10, p25, median, p50, p75, p90, p95, p99, iqr, q.
Note that p50 is the same as median, and q is the same as writing p25 p50 and p75.
Count is the count of nonmissing observations and is the same as n.
CV is the coefficient of variation (sd/mean) and semean is the se of the mean (sd/sq rt n).
You can stratify here, too.
ADJUSTED ESTIMATES: Test for Trend using Post-Estimation
After you conduct a regression with a categorical variable, you can test for trend using the post-estimation CONTRAST command.
You will want to indicate your categorical variable using the i. prefix in your regression statement. Then, when you call on CONTRAST (immediately after the regression) you can use a prefix for that variable that indicates the type of trend you want to look at.
Let's look at BMI (continuous) and age group in a linear regression (or ANOVA in this case).
anova bmi agegrp race
regress bmi i.agegrp race
You can run these contrast statements (and others)
1. Difference from reference level
2. Difference from next level
3. Difference from previous level
4. Looking p-for-trend for linear, quadratic, cubic, quartic, and joint
contrast p.agegrp, noeffects
Using the p. prefix is only meaningful if you have ordinal categories.
If you're using a non-linear model, you can use the same contrast post-estimation statements after your regression, such as:
logit diabetes i.agegrp race bmi
logistic diabetes i.agegrp race bmi
Test for Trend using PROC FREQ: Binary and Ordinal
If you have a binary variable and a ordinal variable, you can use PROC FREQ to generate your trend test using the Cochran-Armitage test in the TABLES statement. It will test for trend across the column variable.
Just a refresher for which is the row and which is the column variable.
PROC FREQ data=[data];
You may also want to request confidence limits (CL) and measures (MEASURES) with your trend test.
You can get the same results as the Stata nptrend by specifying SCORES=MIDRIDIT in the TABLES statement, after the / .
PROC FREQ data=stroke;
This code will give you a test for trend of diabetes frequency across age groups. The output you're looking for is titled "Cochran-Armitage Trend Test". The one-sided p-value is for a test of trend in a pre-defined direction. The two-sided p-value is for a test of trend when you don't know what direction to expect. (I'm partial to two-sided p-values).
A small p-value means you can reject the null hypothesis of NO TREND.
Test for Trend using PROC NPAR1WAY: Continuous and Ordinal
If you want to test for trend with a continuous variable across ordinal categories, you can use PROC NPAR1WAY and request the Wilcoxon Rank Sum test.
PROC NPAR1WAY data=stroke WILCOXON;
This code would compute p for trend of BMI as a continuous variable across age groups. Note that if you have a small sample size that likely does not meet the normal distribution assumptions, you should include the "exact wilcoxon" statement.
In the output, look for the Normal Approximation two-sided p-value, where a small p-value let's you reject the null hypothesis of no trend. If you used the exact option, look for the two-sided p-value under Exact Test.
ADJUSTED ESTIMATES: Test for Trend
In the output from PROC LOGISTIC, the "Testing Global Null Hypothesis: BETA=0" is equivalent to the Cochran-Armitage test used in PROC FREQ, but for your adjusted odds ratios.
You can also ask for separate Wald tests of the betas by using the TEST statement.
PROC LOGISTIC data=[data];
Hopefully you found this post helpful in understanding exactly what your output is giving you. I know I learned a lot just by researching it for you.
About the Author
10/19/2019 11:47:50 am
thank you very much, it helps!
Lindsey Adelle Wood
4/23/2020 10:36:34 am
Hi Bailey. I have a question. I am using SAS and am trying to test the trend of ORs for a categorical variable. In my test statement, I have listed all the categories separated by commas and = 0 (cat1, cat2, cat3, cat4=0). Will this give me the p-value for a linear trend of the ORs of the categories? I can't seem to find a clear answer in SAS documentation.
4/23/2020 03:00:25 pm
Hi Lindsey, great question.
7/20/2020 05:33:36 pm
7/21/2020 11:35:29 am
Hi Rilla, thanks for posting.
7/21/2020 03:29:20 pm
Thanks for the information, Bailey.
10/25/2020 12:27:30 pm
10/26/2020 01:17:11 pm
Hi Alecia! Thanks for taking the time to comment. You're completely right - the Wald test is a global test of a null hypothesis and is distribution-free. I have edited the post to anything about linear trends for Wald test.
11/7/2020 09:39:25 am
Excuse me, I am a graduate school student, and beginner of statistics. I'm very sorry to ask a question about p for trend because of my homework. Is it meaningful and possible to calculate p for trend in multiple logistic regression?
12/7/2020 03:56:22 pm
12/7/2020 04:34:39 pm
8/16/2021 11:09:23 am
Your comment will be posted after it is approved.
Leave a Reply.
Practical solutions for conducting great epidemiology methods. Transparency in code. Attitude of constant improvement.
Appreciate my stuff?