So I have a question I don’t really know how to phrase.
I have a variable called course_id that contains the identification for all participants who have gone through a training at a different site. There are multiple different training sites and therefore multiple course_ids that correlate to those different sites. I would like to run a chi square analysis on the number different training sites, but when I use the course_id variable it does not combine similar sites, but instead gives me the number of participants. In those sites via their course_id.
I was wondering if there was a way to count the number of training instead of the total number of participants via using the course_id variable.
For example if I have the following data set:
course_id location_of_site
00211 rural
00211 rural
00211 rural
33455 urban
33455 urban
33455 urban
66778 rural
66778 rural
number_trainings_rural number_trainings_urban
2 1
I hope I gave enough information and made this clear enough to understand my goal.
Thank you for your time,
Donald S.
If you reduce your data down to a single dimension, I don't think there is any way to compute a chi-square from that. Perhaps this alternative is what you are looking for. Reduce the data down to two dimensions, eliminating the counting of participants. Then try for a chi-square on the remaining data. For example:
proc freq data=have;
tables course_id * location_of_site / noprint out=counts;
run;
proc freq data=counts;
tables course_id * location_of_site / chisq;
run;
The first PROC FREQ generates COUNTS, holding a single record for each combination of COURSE_ID / LOCATION_OF_SITE. The second PROC FREQ uses that as input to compute chi square statistics.
select location_of_site, count(distinct course_id) as no
from have
group by location_of_site
?
If you reduce your data down to a single dimension, I don't think there is any way to compute a chi-square from that. Perhaps this alternative is what you are looking for. Reduce the data down to two dimensions, eliminating the counting of participants. Then try for a chi-square on the remaining data. For example:
proc freq data=have;
tables course_id * location_of_site / noprint out=counts;
run;
proc freq data=counts;
tables course_id * location_of_site / chisq;
run;
The first PROC FREQ generates COUNTS, holding a single record for each combination of COURSE_ID / LOCATION_OF_SITE. The second PROC FREQ uses that as input to compute chi square statistics.
Yes, this is exactly that I am looking for! Wonderful, thank you Astounding.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.