Author: Bailey DeBarmore
While I'm not a big fan of p values, sometimes your coauthors, reviewers, or editors ask for them. In this post I'll show you how to calculate p for trend for ordered categories, like in a Table 1, and for adjusted odds ratios or similar regression.
R users: I don't use R much, but encourage you to search for "prop.trend.test" to learn more about trend tests in R.
Test for Trend using nptrend
If you want to compare mean values across ordered categories, call the nptrend test after tab (for categorical) or tabstat (for continuous). It is an extension of the Wilcoxon Rank Sum test.
Binary and Ordinal Example
tab diabetes agegrp, col
You can stratify, too.
This code produces proportions of diabetes by age group and then tests for trend by age group. The second block of code does this tabulation and trend test separately for males and females.
Note the difference in tab versus nptrend in the by group syntax: in tab, agegrp is included before the comma with no additional words, but in nptrend (and later in tabstat) you include it after the comma with a by().
Continuous and Ordinal Example
tabstat bmi, by(agegrp) stats(mean sd) format(%9.2f)
This code produces mean and sd of BMI by age group to 2 decimal places and then produces a test for trend by age group. Note that the syntax of the by grouping is similar in both tabstat and agegrp.
The default when you omit the stats option is to only give you the mean.
Other statistics you can request are mean, count, n, sum, max, min, range, sd, variance, cv, semean, skewness, kurtosis, p1, p5, p10, p25, median, p50, p75, p90, p95, p99, iqr, q.
Note that p50 is the same as median, and q is the same as writing p25 p50 and p75.
Count is the count of nonmissing observations and is the same as n.
CV is the coefficient of variation (sd/mean) and semean is the se of the mean (sd/sq rt n).
You can stratify here, too.
ADJUSTED ESTIMATES: Test for Trend using Post-Estimation
After you conduct a regression with a categorical variable, you can test for trend using the post-estimation CONTRAST command.
You will want to indicate your categorical variable using the i. prefix in your regression statement. Then, when you call on CONTRAST (immediately after the regression) you can use a prefix for that variable that indicates the type of trend you want to look at.
Let's look at BMI (continuous) and age group in a linear regression (or ANOVA in this case).
anova bmi agegrp race
regress bmi i.agegrp race
You can run these contrast statements (and others)
1. Difference from reference level
2. Difference from next level
3. Difference from previous level
4. Looking p-for-trend for linear, quadratic, cubic, quartic, and joint
contrast p.agegrp, noeffects
Using the p. prefix is only meaningful if you have ordinal categories.
If you're using a non-linear model, you can use the same contrast post-estimation statements after your regression, such as:
logit diabetes i.agegrp race bmi
logistic diabetes i.agegrp race bmi
Test for Trend using PROC FREQ: Binary and Ordinal
If you have a binary variable and a ordinal variable, you can use PROC FREQ to generate your trend test using the Cochran-Armitage test in the TABLES statement. It will test for trend across the column variable.
Just a refresher for which is the row and which is the column variable.
PROC FREQ data=[data];
You may also want to request confidence limits (CL) and measures (MEASURES) with your trend test.
You can get the same results as the Stata nptrend by specifying SCORES=MIDRIDIT in the TABLES statement, after the / .
PROC FREQ data=stroke;
This code will give you a test for trend of diabetes frequency across age groups. The output you're looking for is titled "Cochran-Armitage Trend Test". The one-sided p-value is for a test of trend in a pre-defined direction. The two-sided p-value is for a test of trend when you don't know what direction to expect. (I'm partial to two-sided p-values).
A small p-value means you can reject the null hypothesis of NO TREND.
Test for Trend using PROC NPAR1WAY: Continuous and Ordinal
If you want to test for trend with a continuous variable across ordinal categories, you can use PROC NPAR1WAY and request the Wilcoxon Rank Sum test.
PROC NPAR1WAY data=stroke WILCOXON;
This code would compute p for trend of BMI as a continuous variable across age groups. Note that if you have a small sample size that likely does not meet the normal distribution assumptions, you should include the "exact wilcoxon" statement.
In the output, look for the Normal Approximation two-sided p-value, where a small p-value let's you reject the null hypothesis of no trend. If you used the exact option, look for the two-sided p-value under Exact Test.
ADJUSTED ESTIMATES: Test for Trend
In the output from PROC LOGISTIC, the "Testing Global Null Hypothesis: BETA=0" is equivalent to the Cochran-Armitage test used in PROC FREQ, but for your adjusted odds ratios.
You can also ask for separate Wald tests of the betas by using the TEST statement.
PROC LOGISTIC data=[data];
Hopefully you found this post helpful in understanding exactly what your output is giving you. I know I learned a lot just by researching it for you.
About the Author
Practical solutions for conducting great epidemiology methods. Transparency in code. Attitude of constant improvement.
Appreciate our stuff?