BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
r4321
Pyrite | Level 9

Hello all, 

 

In my data, I would like to sum columns based on multiple column criteria. Notmally, for example, if I had a single column to sum like ‘gender’ I would just sum it based on year-id and I could tell how many women I have per year per organization. However, I have 3 columns and I would like to how often each of their unique combinations occurs per year-id. So, I have variables named year-id, description (which is a text based description), and ethnicity (binary variable). There is not a unique id for each observation, something like “1997-101   Tall    1’ or ‘1998-102  Short   0’ could appear several times and I want to count how many unique combinations of each observation there are. Also, there are many different description types. If someone could give me some suggestions, I would very much appreciate it! 

 

Thanks in advance. 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@r4321 

I don't fully understand what you're asking for so below just as a thought options how you could generate an additional variable Group ID - just to make further coding simpler.

data have;
  infile datalines truncover;
  input (yearid description gender) ($) ethnicity otherVar $;
  datalines;
1 aaa F 0 W
1 bbb M 1 X
1 aaa F 0 Y
2 aaa F 0 Z
;
run;

/* option 1 */
data option1;
  set have;
  groupID=put(md5(upcase(catx('|',yearid,description,gender,ethnicity))),hex32.);
run;

/* option 2 */
proc sort data=have out=option2;
  by yearid description gender ethnicity;
run;
data option2;
  set option2;
  by yearid description gender ethnicity;
  if first.ethnicity then groupID+1;
run;

MD5() used in option 1 is deterministic and will for the same source string always return the same hash value (digest value). This can be useful to combine data from different sources.

I've added the upcase() funtion to make the hash value generation case insensitive.

 

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

Perhaps you could show us a representative portion of your data, following these instructions, and then also show us the desired output.

 

Also, in one place you talk about a sum, but in another place you say "I want to count how many unique combinations of each observation there are". So we need additional clarification there as well.

--
Paige Miller
Astounding
PROC Star

PROC FREQ allows you to count combinations.  If that's what you need, it's fairly straightforward.  For example:

 

proc freq data=have;
tables yearid * description * gender  / missing list;
run;

You might be searching for a slightly different set of counts, but the approach would likely remain the same.

Patrick
Opal | Level 21

@r4321 

I don't fully understand what you're asking for so below just as a thought options how you could generate an additional variable Group ID - just to make further coding simpler.

data have;
  infile datalines truncover;
  input (yearid description gender) ($) ethnicity otherVar $;
  datalines;
1 aaa F 0 W
1 bbb M 1 X
1 aaa F 0 Y
2 aaa F 0 Z
;
run;

/* option 1 */
data option1;
  set have;
  groupID=put(md5(upcase(catx('|',yearid,description,gender,ethnicity))),hex32.);
run;

/* option 2 */
proc sort data=have out=option2;
  by yearid description gender ethnicity;
run;
data option2;
  set option2;
  by yearid description gender ethnicity;
  if first.ethnicity then groupID+1;
run;

MD5() used in option 1 is deterministic and will for the same source string always return the same hash value (digest value). This can be useful to combine data from different sources.

I've added the upcase() funtion to make the hash value generation case insensitive.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 3199 views
  • 4 likes
  • 4 in conversation