1.Access | Adult | 1.8631301731 | 2215 | 1 | 4.17 | 1 |
1.Access | Older Adult | 2.0261437908 | 306 | 1 | 4.17 | 2 |
1.Access | Young Adult | 1.8697916667 | 128 | 1 | 4.17 | 3 |
2.Quality and Appropriateness | Adult | 1.9121645347 | 2215 | 1 | 4.17 | 4 |
2.Quality and Appropriateness | Older Adult | 2.107480029 | 306 | 1 | 4.17 | 5 |
2.Quality and Appropriateness | Young Adult | 1.8784722222 | 128 | 1 | 4.17 | 6 |
Where agec is broken into 3 categories "Young Adult, Adult, and Older Adult," and Adult_Survey_Results is broken into 2 domains "Access, and Quality and Appropriateness."
I want to answer the question "What is the amount of interaction between the age groups and the survey domain?" Essentially, does age group affect the client's answer to the survey?
I've tried the simple
proc freq data=b2_table;
table Adult_Survey_Results*gender / chisq; run;
But it prints results based on the Adult_Survey_Results frequency, where I think I need this based on Adult_Survey_Results N instead.
How would you go about this?
Thank you.
@bazingarollcall wrote:
You're right, this is a proc means output. I've output this table to it's own dataset called b2_table. I am using this same dataset for the proc freq table.
Assuming, 'this same dataset' is b2_table, then you need to add a WEIGHT N statement to your PROC FREQ or use the raw data instead.
proc freq data=b2_table;
table Adult_Survey_Results*gender / chisq;
weight N;
run;
I think my question needs more back-up information.
The code I used to create the b2_stats database is as follows:
proc summary data=formatting mean std lclm uclm n noprint;
class agec gender;
var mean_func ;
var mean_sc ;
var mean_acc ;
var mean_qa ;
var mean_out ;
var mean_part ;
var mean_sat ;
var mean_qol;
output out=b1_b2_stats;
output out=mean1 mean=;
*output out=uclm1 uclm=;
*output out=lclm1 lclm=;
run;
data b1_b2_stats2;
format _STAT_ $30.;
set b1_b2_stats
mean1(in=in2) ;
if in2 then _STAT_ = 'Mean';
run;
proc sort data=b1_b2_stats2
out=b1_stats;
by agec _TYPE_ _stat_;
run;
proc transpose data=b1_stats out=b1_han;
by agec _TYPE_;
id _stat_;
run;
data b1_table /*(keep=agec _TYPE_ _STAT_ Adult_Survey_Results Responses Number_Positive Percent_Positive Confidence_Interval)*/;
format Adult_Survey_Results $40.;
set b1_han;
if _NAME_ = 'mean_acc' then Adult_Survey_Results= '1.Access';
else if _NAME_= 'mean_qa' then Adult_Survey_Results= '2.Quality and Appropriateness';
else if _NAME_= 'mean_func' then Adult_Survey_Results= '7.Functioning';
else if _NAME_= 'mean_sat' then Adult_Survey_Results= '5.General Satisfaction';
else if _NAME_= 'mean_out' then Adult_Survey_Results= '3.Outcomes';
else if _NAME_= 'mean_sc' then Adult_Survey_Results= '6.Social Connectedness';
else if _NAME_= 'mean_part' then Adult_Survey_Results= '4.Participation In Treatment Planning';
else if _NAME_= 'mean_qol' then Adult_Survey_Results= '8.Quality of Life Assessment';
else if _NAME_='_FREQ_' and agec='Adult' then Adult_Survey_Results='Adult Overall';
else if _NAME_='_FREQ_' and agec='Older Adult' then Adult_Survey_Results='O.A. Overall';
else if _NAME_='_FREQ_' and agec='Young Adult' then Adult_Survey_Results='Y.A. Overall';
Responses=N;
STD1=STD;
MEAN1=MEAN*100;
*P951=P95;
*LCLM1=round(LCLM*N);
*UCLM1=round(UCLM*N);
*Percent_Positive=(Number_Positive /Responses);
*Confidence_Interval=cats(trim(LCLM1),'-',trim(UCLM1));
run;
The chisq command produced the attached table, which is very close to what I need but still not quite right. The weight statement pulled from N, but the program summed the totals of all the rows (which totaled the total of the entire dataset, 2757), and added all of those together, which is incorrect. I need the total to remain the same of the entire dataset (2757).
Proc report of b2_table is also attached.
proc freq data=b2_table;
table Adult_Survey_Results*agec / chisq;
weight N;
run;
I'm thinking that I need to somehow "pluck" the N of each Adult_Survey_Results , along with the agec categories and their frequencies, into another table and use this for chisq. Would this work?
@bazingarollcall wrote:
The chisq command produced the attached table, which is very close to what I need but still not quite right. The weight statement pulled from N, but the program summed the totals of all the rows (which totaled the total of the entire dataset, 2757), and added all of those together, which is incorrect. I need the total to remain the same of the entire dataset (2757).
Look at your output data sets from Proc Summary such as b1_b2_stats. You will see that there is a variable named _type_ that indicates the combinations of the Class variables. Since you have two variables you will have 4 levels of _type_, 0, 1, 2 and 3. The 0, 1 and 2 will be: 0 is overall records, 1 each level of one of the class variables and 2 is each level of the other class variables. It is very likely that want to use the NWAY option on Proc summary to only include the _type_ = 3 values, which are the actual combinations of the levels for both class variables.
Otherwise N is going to be about 4 times the number of original records.
I am very confused about doing Chisq with the means of multiple variables as categories though. What is the exact question this chisq is supposed to answer? "Amount of interaction" is not what a chisq tests for. It checks for similarity of distribution of values between two variables. Or in other words, given the counts are they close to the expected counts if the row/column were distributed the same. More of a yes/no than a "how much" test.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.