BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
christineford
Calcite | Level 5

Hello, 

 

I am working with a dataset that contains demographics data that does not always align (i.e. someone can have multiple records and they do not always match). I am trying to code my program so that I can count how many times someone is reported as sex=female or sex=male and then I need to make a new variable for sex based on which outcome occurs the most. 

 

For example, this is kind of like what I have

SSNSex
111111M
111111111M
111111111M
111111111F

 

And then I would want something like this (could look different, but this gist):

SSNSex1N1Sex2N2NewSex
111111111M3F1M

 

Does anyone have any ideas on how I can achieve this? 

TIA

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
proc freq data=sashelp.heart noprint;
table BP_Status*sex/out=have list;
run;

proc sql noprint;
select max(n) into : n
 from (select count(*) as n from have group by BP_Status);
quit;

proc summary data=have;
by BP_Status ;
output out=temp(drop=_:) idgroup(out[&n] (sex count)=);
run;

data want;
 set temp;
 array x{*} sex:;
 array y{*} count:;
 new=x{  whichn(max(of y{*}),of y{*})  };
run;

View solution in original post

5 REPLIES 5
Ksharp
Super User
proc freq data=sashelp.heart noprint;
table BP_Status*sex/out=have list;
run;

proc sql noprint;
select max(n) into : n
 from (select count(*) as n from have group by BP_Status);
quit;

proc summary data=have;
by BP_Status ;
output out=temp(drop=_:) idgroup(out[&n] (sex count)=);
run;

data want;
 set temp;
 array x{*} sex:;
 array y{*} count:;
 new=x{  whichn(max(of y{*}),of y{*})  };
run;
christineford
Calcite | Level 5
This was perfect! Thank you
mkeintz
PROC Star

@christineford 

 

  1.  What if there is a tie?

  2. Your sample seems to assume that SSN 111111 (first row) is the same person as SSN 1111111111 rows 2 through 4).  Is that just a typo?

  3. Is your data already sorted by ssn?

  4. Do you want to reduce multiple records to one as in your example?  This would be a problem if your original records have variables besides SEX that vary over a set of matching SSN's.

 

Assuming the data are sorted, and there are no other variables that vary per #4, then

 

data want;
  do until (last.ssn);
    set have;
    by ssn;
    if sex='M' then nmales=sum(nmales,1); else
    if sex='F' then nfemales=sum(nfemales,1);
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
christineford
Calcite | Level 5
  1.  What if there is a tie? If there is a tie I need to check another database 

  2. Your sample seems to assume that SSN 111111 (first row) is the same person as SSN 1111111111 rows 2 through 4).  Is that just a typo? Yes this was a typo - it should be all the same SSN for the example

  3. Is your data already sorted by ssn? Yes

  4. Do you want to reduce multiple records to one as in your example?  This would be a problem if your original records have variables besides SEX that vary over a set of matching SSN's. -- I need to check multiple variables that have the same issue (take the most recurring value for each variable I need). My idea was to do this in segments to make it less complicated and then create a new dataset with one line per person with all of the necessary demographic information 
Kurt_Bremser
Super User
data have;
input SSN $ Sex :$1.;
datalines;
111111111 M
111111111 M
111111111 M
111111111 F
222222222 F
222222222 F
333333333 M
333333333 F
;

proc sql;
create table max as
select
  ssn,
  sex
from sum
group by ssn
having count = max(count);
quit;

data
  want (drop=ct)
  check
;
do until (last.ssn);
  set max;
  by ssn;
  ct = sum(ct,1);
end;
do until (last.ssn);
  set max;
  by ssn;
  if ct > 1
  then output check;
  else output want;
end;
run;

I avoided to create a wide dataset, as you will do this for several variables; you can always create a wide report with PROC REPORT if needed. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1209 views
  • 3 likes
  • 4 in conversation