Solved: Re: How to Count a variable

daszlosek · Posted 09-16-2014 11:35 AM

So I have a question I don’t really know how to phrase.

I have a variable called course_id that contains the identification for all participants who have gone through a training at a different site. There are multiple different training sites and therefore multiple course_ids that correlate to those different sites. I would like to run a chi square analysis on the number different training sites, but when I use the course_id variable it does not combine similar sites, but instead gives me the number of participants. In those sites via their course_id.

I was wondering if there was a way to count the number of training instead of the total number of participants via using the course_id variable.

For example if I have the following data set:

course_id location_of_site

00211 rural

33455 urban

66778 rural

number_trainings_rural number_trainings_urban

2 1

I hope I gave enough information and made this clear enough to understand my goal.

Thank you for your time,

Donald S.

Astounding · Posted 09-16-2014 11:56 AM

If you reduce your data down to a single dimension, I don't think there is any way to compute a chi-square from that. Perhaps this alternative is what you are looking for. Reduce the data down to two dimensions, eliminating the counting of participants. Then try for a chi-square on the remaining data. For example:

proc freq data=have;

tables course_id * location_of_site / noprint out=counts;

run;

proc freq data=counts;
tables course_id * location_of_site / chisq;

run;

The first PROC FREQ generates COUNTS, holding a single record for each combination of COURSE_ID / LOCATION_OF_SITE. The second PROC FREQ uses that as input to compute chi square statistics.

View solution in original post

LinusH · Posted 09-16-2014 11:47 AM

select location_of_site, count(distinct course_id) as no

from have

group by location_of_site

?

Data never sleeps

Astounding · Posted 09-16-2014 11:56 AM

If you reduce your data down to a single dimension, I don't think there is any way to compute a chi-square from that. Perhaps this alternative is what you are looking for. Reduce the data down to two dimensions, eliminating the counting of participants. Then try for a chi-square on the remaining data. For example:

proc freq data=have;

tables course_id * location_of_site / noprint out=counts;

run;

proc freq data=counts;
tables course_id * location_of_site / chisq;

run;

The first PROC FREQ generates COUNTS, holding a single record for each combination of COURSE_ID / LOCATION_OF_SITE. The second PROC FREQ uses that as input to compute chi square statistics.

daszlosek · Posted 09-16-2014 12:08 PM

Yes, this is exactly that I am looking for! Wonderful, thank you Astounding.

How to Count a variable

Re: How to Count a variable

Re: How to Count a variable

Re: How to Count a variable

Re: How to Count a variable

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away