BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
daszlosek
Quartz | Level 8

So I have a question I don’t really know how to phrase.

I have a variable called course_id that contains the identification for all participants who have gone through a training at a different site. There are multiple different training sites and therefore multiple course_ids that correlate to those different sites. I would like to run a chi square analysis on the number different training sites, but when I use the course_id variable it does not combine similar sites, but instead gives me the number of participants. In those sites via their course_id.

I was wondering if there was a way to count the number of training instead of the total number of participants via using the course_id variable.

For example if I have the following data set:

course_id    location_of_site

00211              rural

00211              rural

00211              rural

33455              urban

33455              urban

33455              urban

66778              rural

66778              rural

number_trainings_rural       number_trainings_urban

              2                                              1

I hope I gave enough information and made this clear enough to understand my goal.

Thank you for your time,

Donald S.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

If you reduce your data down to a single dimension, I don't think there is any way to compute a chi-square from that.  Perhaps this alternative is what you are looking for.  Reduce the data down to two dimensions, eliminating the counting of participants.  Then try for a chi-square on the remaining data.  For example:

proc freq data=have;

  tables course_id * location_of_site / noprint out=counts;

run;

proc freq data=counts;
   tables course_id * location_of_site /     chisq;

run;

The first PROC FREQ generates COUNTS, holding a single record for each combination of COURSE_ID / LOCATION_OF_SITE.  The second PROC FREQ uses that as input to compute chi square statistics.

View solution in original post

3 REPLIES 3
LinusH
Tourmaline | Level 20

select location_of_site, count(distinct course_id) as no

from have

group by location_of_site

?

Data never sleeps
Astounding
PROC Star

If you reduce your data down to a single dimension, I don't think there is any way to compute a chi-square from that.  Perhaps this alternative is what you are looking for.  Reduce the data down to two dimensions, eliminating the counting of participants.  Then try for a chi-square on the remaining data.  For example:

proc freq data=have;

  tables course_id * location_of_site / noprint out=counts;

run;

proc freq data=counts;
   tables course_id * location_of_site /     chisq;

run;

The first PROC FREQ generates COUNTS, holding a single record for each combination of COURSE_ID / LOCATION_OF_SITE.  The second PROC FREQ uses that as input to compute chi square statistics.

daszlosek
Quartz | Level 8

Yes, this is exactly that I am looking for! Wonderful, thank you Astounding.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 744 views
  • 0 likes
  • 3 in conversation