BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LizGagne
Obsidian | Level 7

I'm trying to find the frequencies for only unique ID numbers.  I tried PROC FREQ, but couldn't figure out how to do whatever the SAS equivilant of SELECT DISTINCT is.  I ran the following code and got numebrs that don't add up.

 

 

Code:

 

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n;

 

Result: 

20599

 

Code:

 

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '1a (obs): Demonstrating knowledge of content and pedagogy';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '1a (p&p): Demonstrating knowledge of content and pedagogy';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '1e (obs): Designing coherent instruction';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '1e (p&p): Designing coherent instruction';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '2a: Creating an environment of respect and rapport';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '2d: Managing student behavior';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '3b: Using questioning and discussion techniques';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '3c: Engaging students in learning';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '3d: Using assessment in instruction';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '4e (obs): Growing and developing professionally';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '4e (p&p): Growing and developing professionally';

 

 

Result:

1a (obs): Demonstrating knowledge of content and pedagogy: 700
1a (p&p): Demonstrating knowledge of content and pedagogy: 606
1e (obs): Designing coherent instruction: 15622
1e (p&p): Designing coherent instruction: 1135
2a: Creating an environment of respect and rapport: 2466
2d: Managing student behavior: 1005
3b: Using questioning and discussion techniques: 808
3c: Engaging students in learning: 2516
3d: Using assessment in instruction: 3058
4e (obs): Growing and developing professionally: 5245
4e (p&p): Growing and developing professionally: 588

 

SUM = 33746

33746 != 20599

 

My data set looks like this (had to x out PII):

MOTPIDMOTPComponentDescriptionSchoolDBNEmployeeIDFirstNameLastNameEvaluatorUserIDEvaluatorNameCommentscomplev_flag_ctsentence_countpercent_similar_sentencesRatingMOTPDateY16_Overall_Rating
10518751e (obs): Designing coherent instruction13K527xxxxxxxxxxxxxxx 030.00%N/A9/13/2016 0:00Ineffective
10518751e (obs): Designing coherent instruction13K527xxxxxxxxxxxxxxx 030.00%N/A9/13/2016 0:00Ineffective
10518751e (obs): Designing coherent instruction13K527xxxxxxxxxxxxxxx 030.00%N/A9/13/2016 0:00Ineffective
10518763d: Using assessment in instruction13K527xxxxxxxxxxxxxxx 040.00%N/A9/13/2016 0:00Developing
10518763d: Using assessment in instruction13K527xxxxxxxxxxxxxxx 040.00%N/A9/13/2016 0:00Developing
10518763d: Using assessment in instruction13K527xxxxxxxxxxxxxxx 040.00%N/A9/13/2016 0:00Developing
10518763d: Using assessment in instruction13K527xxxxxxxxxxxxxxx 040.00%N/A9/13/2016 0:00Developing
10518762a: Creating an environment of respect and rapport13K527xxxxxxxxxxxxxxx 1520.00%N/A9/13/2016 0:00Developing
10518781e (obs): Designing coherent instruction27Q400xxxxxxxxxxxxxxx 060.00%N/A9/13/2016 0:00Effective
10518781e (obs): Designing coherent instruction27Q400xxxxxxxxxxxxxxx 060.00%N/A9/13/2016 0:00Effective
10518781e (obs): Designing coherent instruction27Q400xxxxxxxxxxxxxxx 060.00%N/A9/13/2016 0:00Effective
10518861e (obs): Designing coherent instruction13K527xxxxxxxxxxxxxxx 060.00%N/A9/14/2016 0:00Developing
10518861e (obs): Designing coherent instruction13K527xxxxxxxxxxxxxxx 060.00%N/A9/14/2016 0:00Developing
               

 

Looking for any ideas on what went wrong or if there's a better way to get my desired result (the count of unique MOTPID's by MOTPCopmponentDescription.  Thanks so much in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
LizGagne
Obsidian | Level 7

Thanks so much.  Maybe my post was unclear.  I'm not interested in how many MOTPID's had 1, 2, or 3 observations, but rather for a given MOTPComponentDescription how many unique MOTPIDs are there.  

 

Either way, I found a solution:

 

data comment_analysis;
	set WORK.'0__1_MOTP_COMMENTS_0001'n;
run;

proc sort data=comment_analysis nodupkey out=comment_analysis_clean dupout=dups;
by motpid;
run;

proc freq data=comment_analysis_clean;
table MOTPComponentDescription;
run;

View solution in original post

7 REPLIES 7
ballardw
Super User

what does

proc freq data= WORK.'0__1_MOTP_COMMENTS_0000'n;

    tables MOTPID*MOTPComponentDescription /list missing;

run;

 

give you?

 

I suspect the issue you see comes from the way you are using distinct.

What about

PROC SQL;
   SELECT MOTPID, MOTPComponentDescription , count(*) as count

   FROM WORK.'0__1_MOTP_COMMENTS_0000'n

   group byMOTPID, MOTPComponentDescription;

quit;

 

Astounding
PROC Star

Here's a set of two PROC FREQs that (together) give what you asked plus more.  I'm not sure if you're interested in the "plus more" part but you might be.

 

proc freq data=have noprint;

tables MOTPComponentDescription * MOTPID / out=summarized_counts;

run;

 

proc freq data=summarized_counts;

tables MOTPComponentDescription * Count / missing list;

run;

 

This describes your original data set as follows.  For each MOTPComponentDescription, how many MOTPIDs had a single observation?  (That's the part you asked for.)  How many have 2 observations?  How many have 3 observations?  It gives you that summary for whatever counts actually appear in the data set.

LizGagne
Obsidian | Level 7

Thanks so much.  Maybe my post was unclear.  I'm not interested in how many MOTPID's had 1, 2, or 3 observations, but rather for a given MOTPComponentDescription how many unique MOTPIDs are there.  

 

Either way, I found a solution:

 

data comment_analysis;
	set WORK.'0__1_MOTP_COMMENTS_0001'n;
run;

proc sort data=comment_analysis nodupkey out=comment_analysis_clean dupout=dups;
by motpid;
run;

proc freq data=comment_analysis_clean;
table MOTPComponentDescription;
run;
Reeza
Super User

You can also simplify your SQL query. 

 

PROC SQL;
SELECT MOTPComponentDescription, COUNT (DISTINCT MOTPID) as count
FROM WORK.'0__1_MOTP_COMMENTS_0000'n
group by MOTPComponentDescription;
quit;

 

EDIT: unless you have an explicit exclusion criteria, ie a MOTPID can only be per one MOTPComponentDescription then the total doesn't have to add up to the distinct count you found earlier. You can have a single MOTPID across different MOTPComponentDescription, which means the sum can be greater than the overall unique count. 

LizGagne
Obsidian | Level 7

Thanks so much for all of your responses.  I managed to figure out the following solution, which got the exact result I was looking for:

 

data comment_analysis;
	set WORK.'0__1_MOTP_COMMENTS_0001'n;
run;

proc sort data=comment_analysis nodupkey out=comment_analysis_clean dupout=dups;
by motpid;
run;

proc freq data=comment_analysis_clean;
table MOTPComponentDescription;
run;
Reeza
Super User

@LizGagne Do note that the SQL solution is the same answer, without multiple steps. 

Astounding
PROC Star

Many ways to skin the cat here.  My two-step PROC FREQ also gets those numbers if you simplify the second PROC FREQ:

 

proc freq data=summarized_counts;

tables MOTPComponentDescription;

run;

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1823 views
  • 0 likes
  • 4 in conversation