BookmarkSubscribeRSS Feed
Beanpot
Fluorite | Level 6

Hi,

I have a dataset that has multiple variables which can have the same values within different values. For example:

 

ObsVar1Var2 Var3
1n1n2n3
2n3n4.
3n5n1.

 

In order to determine the frequency of each particular value I'm creating a new variable for each value using the following code:

data want;
set have;
if whichc('n3',of var_1-var_20) then n3=1;
else n3=0;
run;

The problem I'm encountering is that when I check the frequency of the values I create against the QC table I'm off by a small percent. For example, the QC table might say total observations with value n3 is 10,000 and when I do a proc freq on the new variable I get 10,020.

 

I'm wondering if the code might be double counting? If so I'm not sure how to fix this. Are there other reasons the counts could be off?

 

Thanks!

4 REPLIES 4
Beanpot
Fluorite | Level 6
To clarify, if an observation had the value n3 in under multiple variables, would my code count that twice?
PaigeMiller
Diamond | Level 26

Your code using WHICHC would count it once per record, not once per variable.

 

A carefully examination of the results from your code would confirm this.

--
Paige Miller
Tom
Super User Tom
Super User

Your posted code is checking if ANY of the 20 variables is exactly equal to the string 'n3'.  It will produce the same result when all 20 of them have n3 as it will when only one of them does.

 

Do any of the values have leading spaces?  Or contain other invisible characters like TAB, LF, CR, FF, non-breaking space?  '090A0D0CA0'x.  For WHICHC to work they need to match exactly.   

 

 

 

 

Quentin
Super User

As an alternative approach, you could transpose your data into a vertical format, and the de-duplicate it to avoid double-counting, then run PROC FREQ.  Something like:

 

data have ;
  input id Var1 : $2. Var2 : $2. Var3 : $2. ;
  cards ;
1 n1 n2 n3
2 n3 n4 .
3 n5 n1 .
4 n1 n2 n2
;
run ;

proc transpose data=have out=vert ;
  var _character_ ;
  by id ;
run ;

proc sort nodupkey data=vert out=vert2 ;
  by id col1 ;
run ;

proc freq data=vert2 ;
  tables col1 ;
run ;
The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 695 views
  • 2 likes
  • 4 in conversation