BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
daszlosek
Quartz | Level 8

So I have a question I don’t really know how to phrase.

I have a variable called course_id that contains the identification for all participants who have gone through a training at a different site. There are multiple different training sites and therefore multiple course_ids that correlate to those different sites. I would like to run a chi square analysis on the number different training sites, but when I use the course_id variable it does not combine similar sites, but instead gives me the number of participants. In those sites via their course_id.

I was wondering if there was a way to count the number of training instead of the total number of participants via using the course_id variable.

For example if I have the following data set:

course_id    location_of_site

00211              rural

00211              rural

00211              rural

33455              urban

33455              urban

33455              urban

66778              rural

66778              rural

number_trainings_rural       number_trainings_urban

              2                                              1

I hope I gave enough information and made this clear enough to understand my goal.

Thank you for your time,

Donald S.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

If you reduce your data down to a single dimension, I don't think there is any way to compute a chi-square from that.  Perhaps this alternative is what you are looking for.  Reduce the data down to two dimensions, eliminating the counting of participants.  Then try for a chi-square on the remaining data.  For example:

proc freq data=have;

  tables course_id * location_of_site / noprint out=counts;

run;

proc freq data=counts;
   tables course_id * location_of_site /     chisq;

run;

The first PROC FREQ generates COUNTS, holding a single record for each combination of COURSE_ID / LOCATION_OF_SITE.  The second PROC FREQ uses that as input to compute chi square statistics.

View solution in original post

3 REPLIES 3
LinusH
Tourmaline | Level 20

select location_of_site, count(distinct course_id) as no

from have

group by location_of_site

?

Data never sleeps
Astounding
PROC Star

If you reduce your data down to a single dimension, I don't think there is any way to compute a chi-square from that.  Perhaps this alternative is what you are looking for.  Reduce the data down to two dimensions, eliminating the counting of participants.  Then try for a chi-square on the remaining data.  For example:

proc freq data=have;

  tables course_id * location_of_site / noprint out=counts;

run;

proc freq data=counts;
   tables course_id * location_of_site /     chisq;

run;

The first PROC FREQ generates COUNTS, holding a single record for each combination of COURSE_ID / LOCATION_OF_SITE.  The second PROC FREQ uses that as input to compute chi square statistics.

daszlosek
Quartz | Level 8

Yes, this is exactly that I am looking for! Wonderful, thank you Astounding.

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1546 views
  • 0 likes
  • 3 in conversation