So I have a question I don’t really know how to phrase.
I have a variable called course_id that contains the identification for all participants who have gone through a training at a different site. There are multiple different training sites and therefore multiple course_ids that correlate to those different sites. I would like to run a chi square analysis on the number different training sites, but when I use the course_id variable it does not combine similar sites, but instead gives me the number of participants. In those sites via their course_id.
I was wondering if there was a way to count the number of training instead of the total number of participants via using the course_id variable.
For example if I have the following data set:
course_id location_of_site
00211 rural
00211 rural
00211 rural
33455 urban
33455 urban
33455 urban
66778 rural
66778 rural
number_trainings_rural number_trainings_urban
2 1
I hope I gave enough information and made this clear enough to understand my goal.
Thank you for your time,
Donald S.
If you reduce your data down to a single dimension, I don't think there is any way to compute a chi-square from that. Perhaps this alternative is what you are looking for. Reduce the data down to two dimensions, eliminating the counting of participants. Then try for a chi-square on the remaining data. For example:
proc freq data=have;
tables course_id * location_of_site / noprint out=counts;
run;
proc freq data=counts;
tables course_id * location_of_site / chisq;
run;
The first PROC FREQ generates COUNTS, holding a single record for each combination of COURSE_ID / LOCATION_OF_SITE. The second PROC FREQ uses that as input to compute chi square statistics.
select location_of_site, count(distinct course_id) as no
from have
group by location_of_site
?
If you reduce your data down to a single dimension, I don't think there is any way to compute a chi-square from that. Perhaps this alternative is what you are looking for. Reduce the data down to two dimensions, eliminating the counting of participants. Then try for a chi-square on the remaining data. For example:
proc freq data=have;
tables course_id * location_of_site / noprint out=counts;
run;
proc freq data=counts;
tables course_id * location_of_site / chisq;
run;
The first PROC FREQ generates COUNTS, holding a single record for each combination of COURSE_ID / LOCATION_OF_SITE. The second PROC FREQ uses that as input to compute chi square statistics.
Yes, this is exactly that I am looking for! Wonderful, thank you Astounding.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.