Help using Base SAS procedures

Frequencies not adding up

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

Frequencies not adding up

I'm trying to find the frequencies for only unique ID numbers.  I tried PROC FREQ, but couldn't figure out how to do whatever the SAS equivilant of SELECT DISTINCT is.  I ran the following code and got numebrs that don't add up.

 

 

Code:

 

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n;

 

Result: 

20599

 

Code:

 

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '1a (obs): Demonstrating knowledge of content and pedagogy';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '1a (p&p): Demonstrating knowledge of content and pedagogy';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '1e (obs): Designing coherent instruction';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '1e (p&p): Designing coherent instruction';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '2a: Creating an environment of respect and rapport';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '2d: Managing student behavior';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '3b: Using questioning and discussion techniques';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '3c: Engaging students in learning';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '3d: Using assessment in instruction';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '4e (obs): Growing and developing professionally';

PROC SQL;
SELECT COUNT (DISTINCT MOTPID) FROM WORK.'0__1_MOTP_COMMENTS_0000'n
WHERE MOTPComponentDescription = '4e (p&p): Growing and developing professionally';

 

 

Result:

1a (obs): Demonstrating knowledge of content and pedagogy: 700
1a (p&p): Demonstrating knowledge of content and pedagogy: 606
1e (obs): Designing coherent instruction: 15622
1e (p&p): Designing coherent instruction: 1135
2a: Creating an environment of respect and rapport: 2466
2d: Managing student behavior: 1005
3b: Using questioning and discussion techniques: 808
3c: Engaging students in learning: 2516
3d: Using assessment in instruction: 3058
4e (obs): Growing and developing professionally: 5245
4e (p&p): Growing and developing professionally: 588

 

SUM = 33746

33746 != 20599

 

My data set looks like this (had to x out PII):

MOTPIDMOTPComponentDescriptionSchoolDBNEmployeeIDFirstNameLastNameEvaluatorUserIDEvaluatorNameCommentscomplev_flag_ctsentence_countpercent_similar_sentencesRatingMOTPDateY16_Overall_Rating
10518751e (obs): Designing coherent instruction13K527xxxxxxxxxxxxxxx 030.00%N/A9/13/2016 0:00Ineffective
10518751e (obs): Designing coherent instruction13K527xxxxxxxxxxxxxxx 030.00%N/A9/13/2016 0:00Ineffective
10518751e (obs): Designing coherent instruction13K527xxxxxxxxxxxxxxx 030.00%N/A9/13/2016 0:00Ineffective
10518763d: Using assessment in instruction13K527xxxxxxxxxxxxxxx 040.00%N/A9/13/2016 0:00Developing
10518763d: Using assessment in instruction13K527xxxxxxxxxxxxxxx 040.00%N/A9/13/2016 0:00Developing
10518763d: Using assessment in instruction13K527xxxxxxxxxxxxxxx 040.00%N/A9/13/2016 0:00Developing
10518763d: Using assessment in instruction13K527xxxxxxxxxxxxxxx 040.00%N/A9/13/2016 0:00Developing
10518762a: Creating an environment of respect and rapport13K527xxxxxxxxxxxxxxx 1520.00%N/A9/13/2016 0:00Developing
10518781e (obs): Designing coherent instruction27Q400xxxxxxxxxxxxxxx 060.00%N/A9/13/2016 0:00Effective
10518781e (obs): Designing coherent instruction27Q400xxxxxxxxxxxxxxx 060.00%N/A9/13/2016 0:00Effective
10518781e (obs): Designing coherent instruction27Q400xxxxxxxxxxxxxxx 060.00%N/A9/13/2016 0:00Effective
10518861e (obs): Designing coherent instruction13K527xxxxxxxxxxxxxxx 060.00%N/A9/14/2016 0:00Developing
10518861e (obs): Designing coherent instruction13K527xxxxxxxxxxxxxxx 060.00%N/A9/14/2016 0:00Developing
               

 

Looking for any ideas on what went wrong or if there's a better way to get my desired result (the count of unique MOTPID's by MOTPCopmponentDescription.  Thanks so much in advance!


Accepted Solutions
Solution
‎01-13-2017 02:10 PM
Occasional Contributor
Posts: 11

Re: Frequencies not adding up

Thanks so much.  Maybe my post was unclear.  I'm not interested in how many MOTPID's had 1, 2, or 3 observations, but rather for a given MOTPComponentDescription how many unique MOTPIDs are there.  

 

Either way, I found a solution:

 

data comment_analysis;
	set WORK.'0__1_MOTP_COMMENTS_0001'n;
run;

proc sort data=comment_analysis nodupkey out=comment_analysis_clean dupout=dups;
by motpid;
run;

proc freq data=comment_analysis_clean;
table MOTPComponentDescription;
run;

View solution in original post


All Replies
Super User
Posts: 10,500

Re: Frequencies not adding up

what does

proc freq data= WORK.'0__1_MOTP_COMMENTS_0000'n;

    tables MOTPID*MOTPComponentDescription /list missing;

run;

 

give you?

 

I suspect the issue you see comes from the way you are using distinct.

What about

PROC SQL;
   SELECT MOTPID, MOTPComponentDescription , count(*) as count

   FROM WORK.'0__1_MOTP_COMMENTS_0000'n

   group byMOTPID, MOTPComponentDescription;

quit;

 

Super User
Posts: 5,081

Re: Frequencies not adding up

Here's a set of two PROC FREQs that (together) give what you asked plus more.  I'm not sure if you're interested in the "plus more" part but you might be.

 

proc freq data=have noprint;

tables MOTPComponentDescription * MOTPID / out=summarized_counts;

run;

 

proc freq data=summarized_counts;

tables MOTPComponentDescription * Count / missing list;

run;

 

This describes your original data set as follows.  For each MOTPComponentDescription, how many MOTPIDs had a single observation?  (That's the part you asked for.)  How many have 2 observations?  How many have 3 observations?  It gives you that summary for whatever counts actually appear in the data set.

Solution
‎01-13-2017 02:10 PM
Occasional Contributor
Posts: 11

Re: Frequencies not adding up

Thanks so much.  Maybe my post was unclear.  I'm not interested in how many MOTPID's had 1, 2, or 3 observations, but rather for a given MOTPComponentDescription how many unique MOTPIDs are there.  

 

Either way, I found a solution:

 

data comment_analysis;
	set WORK.'0__1_MOTP_COMMENTS_0001'n;
run;

proc sort data=comment_analysis nodupkey out=comment_analysis_clean dupout=dups;
by motpid;
run;

proc freq data=comment_analysis_clean;
table MOTPComponentDescription;
run;
Super User
Posts: 17,819

Re: Frequencies not adding up

[ Edited ]

You can also simplify your SQL query. 

 

PROC SQL;
SELECT MOTPComponentDescription, COUNT (DISTINCT MOTPID) as count
FROM WORK.'0__1_MOTP_COMMENTS_0000'n
group by MOTPComponentDescription;
quit;

 

EDIT: unless you have an explicit exclusion criteria, ie a MOTPID can only be per one MOTPComponentDescription then the total doesn't have to add up to the distinct count you found earlier. You can have a single MOTPID across different MOTPComponentDescription, which means the sum can be greater than the overall unique count. 

Occasional Contributor
Posts: 11

Re: Frequencies not adding up

Thanks so much for all of your responses.  I managed to figure out the following solution, which got the exact result I was looking for:

 

data comment_analysis;
	set WORK.'0__1_MOTP_COMMENTS_0001'n;
run;

proc sort data=comment_analysis nodupkey out=comment_analysis_clean dupout=dups;
by motpid;
run;

proc freq data=comment_analysis_clean;
table MOTPComponentDescription;
run;
Super User
Posts: 17,819

Re: Frequencies not adding up

@LizGagne Do note that the SQL solution is the same answer, without multiple steps. 

Super User
Posts: 5,081

Re: Frequencies not adding up

Many ways to skin the cat here.  My two-step PROC FREQ also gets those numbers if you simplify the second PROC FREQ:

 

proc freq data=summarized_counts;

tables MOTPComponentDescription;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 579 views
  • 0 likes
  • 4 in conversation