I'm trying to find the frequencies for only unique ID numbers. I tried PROC FREQ, but couldn't figure out how to do whatever the SAS equivilant of SELECT DISTINCT is. I ran the following code and got numebrs that don't add up.
Code:
PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n;
Result:
20599 |
Code:
PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n WHERE MOTPComponentDescription = '1a (obs): Demonstrating knowledge of content and pedagogy'; PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n WHERE MOTPComponentDescription = '1a (p&p): Demonstrating knowledge of content and pedagogy'; PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n WHERE MOTPComponentDescription = '1e (obs): Designing coherent instruction'; PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n WHERE MOTPComponentDescription = '1e (p&p): Designing coherent instruction'; PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n WHERE MOTPComponentDescription = '2a: Creating an environment of respect and rapport'; PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n WHERE MOTPComponentDescription = '2d: Managing student behavior'; PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n WHERE MOTPComponentDescription = '3b: Using questioning and discussion techniques'; PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n WHERE MOTPComponentDescription = '3c: Engaging students in learning'; PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n WHERE MOTPComponentDescription = '3d: Using assessment in instruction'; PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n WHERE MOTPComponentDescription = '4e (obs): Growing and developing professionally'; PROC SQL; SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n WHERE MOTPComponentDescription = '4e (p&p): Growing and developing professionally';
Result:
1a (obs): Demonstrating knowledge of content and pedagogy: 700
1a (p&p): Demonstrating knowledge of content and pedagogy: 606
1e (obs): Designing coherent instruction: 15622
1e (p&p): Designing coherent instruction: 1135
2a: Creating an environment of respect and rapport: 2466
2d: Managing student behavior: 1005
3b: Using questioning and discussion techniques: 808
3c: Engaging students in learning: 2516
3d: Using assessment in instruction: 3058
4e (obs): Growing and developing professionally: 5245
4e (p&p): Growing and developing professionally: 588
SUM = 33746
33746 != 20599
My data set looks like this (had to x out PII):
MOTPID | MOTPComponentDescription | SchoolDBN | EmployeeID | FirstName | LastName | EvaluatorUserID | EvaluatorName | Comments | complev_flag_ct | sentence_count | percent_similar_sentences | Rating | MOTPDate | Y16_Overall_Rating |
1051875 | 1e (obs): Designing coherent instruction | 13K527 | xxx | xxx | xxx | xxx | xxx | 0 | 3 | 0.00% | N/A | 9/13/2016 0:00 | Ineffective | |
1051875 | 1e (obs): Designing coherent instruction | 13K527 | xxx | xxx | xxx | xxx | xxx | 0 | 3 | 0.00% | N/A | 9/13/2016 0:00 | Ineffective | |
1051875 | 1e (obs): Designing coherent instruction | 13K527 | xxx | xxx | xxx | xxx | xxx | 0 | 3 | 0.00% | N/A | 9/13/2016 0:00 | Ineffective | |
1051876 | 3d: Using assessment in instruction | 13K527 | xxx | xxx | xxx | xxx | xxx | 0 | 4 | 0.00% | N/A | 9/13/2016 0:00 | Developing | |
1051876 | 3d: Using assessment in instruction | 13K527 | xxx | xxx | xxx | xxx | xxx | 0 | 4 | 0.00% | N/A | 9/13/2016 0:00 | Developing | |
1051876 | 3d: Using assessment in instruction | 13K527 | xxx | xxx | xxx | xxx | xxx | 0 | 4 | 0.00% | N/A | 9/13/2016 0:00 | Developing | |
1051876 | 3d: Using assessment in instruction | 13K527 | xxx | xxx | xxx | xxx | xxx | 0 | 4 | 0.00% | N/A | 9/13/2016 0:00 | Developing | |
1051876 | 2a: Creating an environment of respect and rapport | 13K527 | xxx | xxx | xxx | xxx | xxx | 1 | 5 | 20.00% | N/A | 9/13/2016 0:00 | Developing | |
1051878 | 1e (obs): Designing coherent instruction | 27Q400 | xxx | xxx | xxx | xxx | xxx | 0 | 6 | 0.00% | N/A | 9/13/2016 0:00 | Effective | |
1051878 | 1e (obs): Designing coherent instruction | 27Q400 | xxx | xxx | xxx | xxx | xxx | 0 | 6 | 0.00% | N/A | 9/13/2016 0:00 | Effective | |
1051878 | 1e (obs): Designing coherent instruction | 27Q400 | xxx | xxx | xxx | xxx | xxx | 0 | 6 | 0.00% | N/A | 9/13/2016 0:00 | Effective | |
1051886 | 1e (obs): Designing coherent instruction | 13K527 | xxx | xxx | xxx | xxx | xxx | 0 | 6 | 0.00% | N/A | 9/14/2016 0:00 | Developing | |
1051886 | 1e (obs): Designing coherent instruction | 13K527 | xxx | xxx | xxx | xxx | xxx | 0 | 6 | 0.00% | N/A | 9/14/2016 0:00 | Developing | |
Looking for any ideas on what went wrong or if there's a better way to get my desired result (the count of unique MOTPID's by MOTPCopmponentDescription. Thanks so much in advance!
Thanks so much. Maybe my post was unclear. I'm not interested in how many MOTPID's had 1, 2, or 3 observations, but rather for a given MOTPComponentDescription how many unique MOTPIDs are there.
Either way, I found a solution:
data comment_analysis;
set WORK.'0__1_MOTP_COMMENTS_0001'n;
run;
proc sort data=comment_analysis nodupkey out=comment_analysis_clean dupout=dups;
by motpid;
run;
proc freq data=comment_analysis_clean;
table MOTPComponentDescription;
run;
what does
proc freq data= WORK.'0__1_MOTP_COMMENTS_0000'n;
tables MOTPID*MOTPComponentDescription /list missing;
run;
give you?
I suspect the issue you see comes from the way you are using distinct.
What about
PROC SQL;
SELECT MOTPID, MOTPComponentDescription , count(*) as count
FROM WORK.'0__1_MOTP_COMMENTS_0000'n
group byMOTPID, MOTPComponentDescription;
quit;
Here's a set of two PROC FREQs that (together) give what you asked plus more. I'm not sure if you're interested in the "plus more" part but you might be.
proc freq data=have noprint;
tables MOTPComponentDescription * MOTPID / out=summarized_counts;
run;
proc freq data=summarized_counts;
tables MOTPComponentDescription * Count / missing list;
run;
This describes your original data set as follows. For each MOTPComponentDescription, how many MOTPIDs had a single observation? (That's the part you asked for.) How many have 2 observations? How many have 3 observations? It gives you that summary for whatever counts actually appear in the data set.
Thanks so much. Maybe my post was unclear. I'm not interested in how many MOTPID's had 1, 2, or 3 observations, but rather for a given MOTPComponentDescription how many unique MOTPIDs are there.
Either way, I found a solution:
data comment_analysis;
set WORK.'0__1_MOTP_COMMENTS_0001'n;
run;
proc sort data=comment_analysis nodupkey out=comment_analysis_clean dupout=dups;
by motpid;
run;
proc freq data=comment_analysis_clean;
table MOTPComponentDescription;
run;
You can also simplify your SQL query.
PROC SQL;
SELECT MOTPComponentDescription, COUNT (DISTINCT MOTPID) as count
FROM WORK.'0__1_MOTP_COMMENTS_0000'n
group by MOTPComponentDescription;
quit;
EDIT: unless you have an explicit exclusion criteria, ie a MOTPID can only be per one MOTPComponentDescription then the total doesn't have to add up to the distinct count you found earlier. You can have a single MOTPID across different MOTPComponentDescription, which means the sum can be greater than the overall unique count.
Thanks so much for all of your responses. I managed to figure out the following solution, which got the exact result I was looking for:
data comment_analysis;
set WORK.'0__1_MOTP_COMMENTS_0001'n;
run;
proc sort data=comment_analysis nodupkey out=comment_analysis_clean dupout=dups;
by motpid;
run;
proc freq data=comment_analysis_clean;
table MOTPComponentDescription;
run;
@LizGagne Do note that the SQL solution is the same answer, without multiple steps.
Many ways to skin the cat here. My two-step PROC FREQ also gets those numbers if you simplify the second PROC FREQ:
proc freq data=summarized_counts;
tables MOTPComponentDescription;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.