Hello,
I need to perform a cmh test with 3 stratification variables: AREA, OBESE, SMOKE.
I'm performing the individual cmh tests against placebo treatment.
The analysis variable i'm using is IMPROVE, and i'm trying to analyze it by TRT (treatment), according to visit.
The data is below, as well as what i think the code should look like.
Thanks!
Note: This data probably does not make sense because i had to dummy up values.
Also, other than baseline, it was intentional that there were missing values in IMPROVE. Am I supposed to subset the data so that it doesn't include missing data in the analysis variable?
data _have1;
infile datalines dsd dlm=",";
length trt visit $20;
input subject $ trt $ visit $ area $ obese $ smoke $ improve $;
datalines;
001, active 1, baseline, east, Y, Y,
001, active 1, day 2, east, Y, Y, Y
001, active 1, day 3, east, Y, Y,
001, active 1, day 4, east, Y, Y, Y
001, active 1, day 5, east, Y, Y, Y
002, active 2, baseline, west, N, N,
002, active 2, day 2, west, N, N,
002, active 2, day 3, west, N, N, N
002, active 2, day 4, west, N, N,
002, active 2, day 5, west, N, N, N
003, placebo, baseline, east, Y, Y,
003, placebo, day 2, east, Y, Y, Y
003, placebo, day 3, east, Y, Y,
003, placebo, day 4, east, Y, Y, Y
003, placebo, day 5, east, Y, Y,
004, placebo, baseline, north, Y, Y,
004, placebo, day 2, north, Y, Y, Y
004, placebo, day 3, north, Y, Y, Y
004, placebo, day 4, north, Y, Y, Y
004, placebo, day 5, north, Y, Y, Y
005, active 1, baseline, north, Y, Y,
005, active 1, day 2, north, Y, Y, Y
005, active 1, day 3, north, Y, Y, Y
005, active 1, day 4, north, Y, Y, N
005, active 1, day 5, north, Y, Y, Y
006, active 2, baseline, west, Y, Y,
006, active 2, day 2, west, Y, Y, N
006, active 2, day 3, west, Y, Y, N
006, active 2, day 4, west, Y, Y,
006, active 2, day 5, west, Y, Y, N
;
run;
data have1; set _have1;
* create baseline flag;
if upcase(visit)="BASELINE" then blfl="Y";
run;
proc freq data=have1;
by visit;
table area*obese*smoke*trt*improve/ cmh out=dsnout;
quit;
proc freq data=sashelp.heart ;
table bp_status*status*sex/cmh ;
output out=want cmh;
run;
First a bit about coding style for an example.
Just include the additional program statements in the first data step, no need to create a second data set (for this example anyway). Or don't bother as you aren't showing any use of the blfl variable.
data have1; infile datalines dsd dlm=","; length trt visit $20;
input subject $ trt $ visit $ area $ obese $ smoke $ improve $; if upcase(visit)="BASELINE" then blfl="Y"; datalines;
Second to use BY Visit in the Proc freq you either have to sort the data by the Visit variable OR use the "notsorted" option on the BY statement. Otherwise you get this in the log and incomplete output.
ERROR: Data set WORK.HAVE1 is not sorted in ascending sequence. The current BY group has visit = day 5 and the next BY group has visit = baseline.
You also are getting messages like this for multiple levels of Visit.
NOTE: No statistics are computed for trt * improve because all data are missing. NOTE: The above message was for the following BY group: visit=baseline
This is also related to the failure to sort because your data is using only the first record where visit=<some value> and there is only one set of values for Trt and Improve. So not enough data to do a CMH test.
By default missing values will be excluded from the analysis. IF you want to see analysis including missing values then you can include the option MISSING on the Tables statement. That will treat the missing values as a valid level of all variables on the Tables statement.
Which you should use depends on the actual analysis you want.
proc freq data=sashelp.heart ;
table bp_status*status*sex/cmh ;
output out=want cmh;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.