BookmarkSubscribeRSS Feed
slottemc
Obsidian | Level 7

Hello,

 

Newbie to using some of SAS' statistical procedures. I have SAS 9.4. Normally, I am only comparing one variable at a time for a test and control group to see if they statistically differ, but now I want to combine multiple variables together and see if grouped they differ between the test and control group (or even among-st each other in the test group).

 

My variables are like Region (west coast, south, northeast, etc), age cohort (<30, 30 -39, 40 - 49, etc) , ADI (< 25, 25-49, etc), etc. So, as an example, I want to know if my population in the West Coast aged between 30 - 49 with an ADI of 25 - 49 statistically differ from any other combo in either the control or test group.

 

What is the best way/procedure to do this?

 

I thank you for any help/insight you can provide.

 

Example of data:

data data_set;
        input Group $ 1-7 Region $ 8-10 Age_Cohort $ 11-16 Key 17-20 Targeted 21-22 Gap_Closed 23-24;
        datalines;
Control WC 30-39 123 1 1
Target  WC 30-39 456 1 1
Control WC 30-39 789 1 0
Target  WC 30-39 012 1 1
Control WC 40-49 345 1 1
Target  WC 40-49 678 1 0
Control S  50-59 901 1 0
Target  S  50-59 234 1 0
Control S  60-69 567 1 1
Target  S  60-69 890 1 1
      ;

 

5 REPLIES 5
PaigeMiller
Diamond | Level 26

The problem with PROC MULTTEST here is that only one CLASS variable is allowed, while the problem has three CLASS variables, specifically GROUP, REGION, AGE_COHORT (and the text seems to indicate there are more than three, although the data set only has three).

 

So, the problem really seems to be a three-way (or higher) ANOVA, which can be run in PROC GLM (assuming certain conditions are met). The ADJUST= option of the LSMEANS statement would allow one of the different multiple comparison methods to be used.

--
Paige Miller
slottemc
Obsidian | Level 7

Yes, that is the case and what I thinking when I was typing the response to Reeza. I will now go research your suggestion: proc glm with adjust option.

 

Also, I do have more class variables, I was just limiting it for ease of sharing.

 

Thank you!

slottemc
Obsidian | Level 7

Hi again,

 

Got busy with other projections and finally returning to this one.  Below is the code I ended up with. Fingers crossed I am on the right track?!

 

Based on the results, it only looks like the target vs control is significant (See attachments). I did a separate test using proc freq to test region significance between target and control and it looks like the West Coast targets were statistically different from the West Coast control group. So I was wondering, how exactly do I interpret the two different results using proc glm and just the chisq from the proc freq? Am I not asking the right question with the model statement in proc glm? 

 

proc glm data = unixwork.col_interim2 outstat=unixwork.stat_sig_testing;
      class bucket region age_cohort adi_cohort;
      model Quest_Closed_Gap = Bucket|Region|age_cohort|ADI_cohort / tolerance;
    lsmeans Bucket|Region|age_cohort|ADI_cohort  / pdiff=all adjust=tukey;
   run;
   quit;

Bucket values are target or control.

Region values are West Coast or South

Age are 55- 59 or 60 - 64 

ADI are in quartiles from 0 -100

 

proc sort data= unixwork.col_interim2;
by region bucket Quest_Closed_Gap;
run;
proc summary data=unixwork.col_interim2;
var COL_TARGET;
by region bucket Quest_Closed_Gap;
output out=unixwork.region sum=;
run;
/*ods output PdiffCLs=pdiff;*/
/* Region - does target differ from control group? */
proc freq data=unixwork.region;
by region;
        weight COL_TARGET;
        table bucket * Quest_Closed_Gap / chisq riskdiff;
 		output out=unixwork.region_stat chisq;
        run;
/* only outputs if statistically differs */
data unixwork.region_stat_sig;
set unixwork.region_stat;
by region;
if P_PCHI < 0.05;
run;
/* west coast targets performed stat better than control; south was similar to control */

Thanks again!

slottemc
Obsidian | Level 7

Thanks for your response. I'm reading up on how exactly to use this procedure, which has given me some follow up questions.

 

Based on the way my data is layed out, would region, age cohort, adi, etc. actually be groups versus variables? The main measurement I have is gaps closed out of the number targeted (1 = yes, 0=no) then it would be by region, age, etc. within the test and control group. The reason I ask is one article I found states this:

 

"PROC MULTTEST does not provide closed tests, and therefore, caution is urged, in the following
situations:
• Multiple comparisons of means involving three or more groups, using permutation
resampling
• Multiple comparisons of binary variables involving three or more groups. " https://pdfs.semanticscholar.org/93cd/57288bc50ce9bc99aef89f7cab9f61bc3bbb.pdf

 

And I'm worried my data might fit that? Unless, I am just thinking about my data backwards. Like I said new to this statistical procedures, so need to read more about the test itself.

 

Thanks again.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 574 views
  • 5 likes
  • 3 in conversation