BookmarkSubscribeRSS Feed
bazingarollcall
Fluorite | Level 6
Hello, 
I have the following dataset:
Adult_Survey_Results agec MEAN N Frequency Percent Cumulative
1.AccessAdult1.8631301731221514.171
1.AccessOlder Adult2.026143790830614.172
1.AccessYoung Adult1.869791666712814.173
2.Quality and AppropriatenessAdult1.9121645347221514.174
2.Quality and AppropriatenessOlder Adult2.10748002930614.175
2.Quality and AppropriatenessYoung Adult1.878472222212814.176

Where agec is broken into 3 categories "Young Adult, Adult, and Older Adult," and Adult_Survey_Results is broken into 2 domains "Access, and Quality and Appropriateness."

I want to answer the question "What is the amount of interaction between the age groups and the survey domain?" Essentially, does age group affect the client's answer to the survey?

 

I've tried the simple 

proc freq data=b2_table;

table Adult_Survey_Results*gender / chisq; run;

 

 

But it prints results based on the Adult_Survey_Results frequency, where I think I need this based on Adult_Survey_Results N instead.

How would you go about this?

Thank you.

7 REPLIES 7
Reeza
Super User
That looks like an output from PROC MEANs. What does your raw data look like? Which data set are you using in PROC FREQ - the output from PROC MEANS or the raw data?
bazingarollcall
Fluorite | Level 6
You're right, this is a proc means output. I've output this table to it's own dataset called b2_table. I am using this same dataset for the proc freq table.
Reeza
Super User

@bazingarollcall wrote:
You're right, this is a proc means output. I've output this table to it's own dataset called b2_table. I am using this same dataset for the proc freq table.

Assuming, 'this same dataset' is b2_table, then you need to add a WEIGHT N statement to your PROC FREQ or use the raw data instead.

 

proc freq data=b2_table;
table Adult_Survey_Results*gender / chisq; 
weight N;
run;
ballardw
Super User
It appears that you would use the data set that you used for INPUT to Proc Means as the data for Proc freq.
The tables statement would look like:
Tables Adult_Survey_Results * agec / chisq;
bazingarollcall
Fluorite | Level 6

I think my question needs more back-up information.

The code I used to create the b2_stats database is as follows:

proc summary data=formatting mean std lclm uclm n noprint;
class agec gender;
var mean_func ;
var mean_sc ;
var mean_acc ;
var mean_qa ;
var mean_out ;
var mean_part ;
var mean_sat ;
var mean_qol;
output out=b1_b2_stats;

output out=mean1 mean=;
*output out=uclm1 uclm=;
*output out=lclm1 lclm=;

run;

data b1_b2_stats2;
format _STAT_ $30.;
set b1_b2_stats
mean1(in=in2) ;
if in2 then _STAT_ = 'Mean';
run;

proc sort data=b1_b2_stats2
out=b1_stats;
by agec _TYPE_ _stat_;
run;

proc transpose data=b1_stats out=b1_han;
by agec _TYPE_;
id _stat_;
run;

data b1_table /*(keep=agec _TYPE_ _STAT_ Adult_Survey_Results Responses Number_Positive Percent_Positive Confidence_Interval)*/;
format Adult_Survey_Results $40.;

set b1_han;
if _NAME_ = 'mean_acc' then Adult_Survey_Results= '1.Access';
else if _NAME_= 'mean_qa' then Adult_Survey_Results= '2.Quality and Appropriateness';
else if _NAME_= 'mean_func' then Adult_Survey_Results= '7.Functioning';
else if _NAME_= 'mean_sat' then Adult_Survey_Results= '5.General Satisfaction';
else if _NAME_= 'mean_out' then Adult_Survey_Results= '3.Outcomes';
else if _NAME_= 'mean_sc' then Adult_Survey_Results= '6.Social Connectedness';
else if _NAME_= 'mean_part' then Adult_Survey_Results= '4.Participation In Treatment Planning';
else if _NAME_= 'mean_qol' then Adult_Survey_Results= '8.Quality of Life Assessment';
else if _NAME_='_FREQ_' and agec='Adult' then Adult_Survey_Results='Adult Overall';
else if _NAME_='_FREQ_' and agec='Older Adult' then Adult_Survey_Results='O.A. Overall';
else if _NAME_='_FREQ_' and agec='Young Adult' then Adult_Survey_Results='Y.A. Overall';

Responses=N;
STD1=STD;
MEAN1=MEAN*100;
*P951=P95;
*LCLM1=round(LCLM*N);
*UCLM1=round(UCLM*N);
*Percent_Positive=(Number_Positive /Responses);
*Confidence_Interval=cats(trim(LCLM1),'-',trim(UCLM1));
run;


The chisq command produced the attached table, which is very close to what I need but still not quite right. The weight statement pulled from N, but the program summed the totals of all the rows (which totaled the total of the entire dataset, 2757), and added all of those together, which is incorrect. I need the total to remain the same of the entire dataset (2757).


Proc report of b2_table is also attached.

proc freq data=b2_table;
table Adult_Survey_Results*agec / chisq;
weight N;
run;

I'm thinking that I need to somehow "pluck" the N of each Adult_Survey_Results , along with the agec categories and their frequencies, into another table and use this for chisq. Would this work?

ballardw
Super User

@bazingarollcall wrote:


The chisq command produced the attached table, which is very close to what I need but still not quite right. The weight statement pulled from N, but the program summed the totals of all the rows (which totaled the total of the entire dataset, 2757), and added all of those together, which is incorrect. I need the total to remain the same of the entire dataset (2757).

 

Look at your output data sets from Proc Summary such as b1_b2_stats. You will see that there is a variable named _type_ that indicates the combinations of the Class variables. Since you have two variables you will have 4 levels of _type_, 0, 1, 2 and 3. The 0, 1 and 2 will be: 0 is overall records, 1 each level of one of the class variables and 2 is each level of the other class variables. It is very likely that want to use the NWAY option on Proc summary to only include the _type_ = 3 values, which are the actual combinations of the levels for both class variables.

Otherwise N is going to be about 4 times the number of original records.

 

I am very confused about doing Chisq with the means of multiple variables as categories though. What is the exact question this chisq is supposed to answer? "Amount of interaction" is not what a chisq tests for. It checks for similarity of distribution of values between two variables. Or in other words, given the counts are they close to the expected counts if the row/column were distributed the same. More of a yes/no than a "how much" test.

bazingarollcall
Fluorite | Level 6
I thought about your response overnight and can't thank you enough for it.

I no longer think chisq is appropriate in this situation; I will need something like T-Test to determine if there is significant difference between the means of the 3 groups of agec.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 677 views
  • 5 likes
  • 3 in conversation