Hello,
I am working with a dataset that contains demographics data that does not always align (i.e. someone can have multiple records and they do not always match). I am trying to code my program so that I can count how many times someone is reported as sex=female or sex=male and then I need to make a new variable for sex based on which outcome occurs the most.
For example, this is kind of like what I have
SSN | Sex |
111111 | M |
111111111 | M |
111111111 | M |
111111111 | F |
And then I would want something like this (could look different, but this gist):
SSN | Sex1 | N1 | Sex2 | N2 | NewSex |
111111111 | M | 3 | F | 1 | M |
Does anyone have any ideas on how I can achieve this?
TIA
proc freq data=sashelp.heart noprint;
table BP_Status*sex/out=have list;
run;
proc sql noprint;
select max(n) into : n
from (select count(*) as n from have group by BP_Status);
quit;
proc summary data=have;
by BP_Status ;
output out=temp(drop=_:) idgroup(out[&n] (sex count)=);
run;
data want;
set temp;
array x{*} sex:;
array y{*} count:;
new=x{ whichn(max(of y{*}),of y{*}) };
run;
proc freq data=sashelp.heart noprint;
table BP_Status*sex/out=have list;
run;
proc sql noprint;
select max(n) into : n
from (select count(*) as n from have group by BP_Status);
quit;
proc summary data=have;
by BP_Status ;
output out=temp(drop=_:) idgroup(out[&n] (sex count)=);
run;
data want;
set temp;
array x{*} sex:;
array y{*} count:;
new=x{ whichn(max(of y{*}),of y{*}) };
run;
Assuming the data are sorted, and there are no other variables that vary per #4, then
data want;
do until (last.ssn);
set have;
by ssn;
if sex='M' then nmales=sum(nmales,1); else
if sex='F' then nfemales=sum(nfemales,1);
end;
run;
data have;
input SSN $ Sex :$1.;
datalines;
111111111 M
111111111 M
111111111 M
111111111 F
222222222 F
222222222 F
333333333 M
333333333 F
;
proc sql;
create table max as
select
ssn,
sex
from sum
group by ssn
having count = max(count);
quit;
data
want (drop=ct)
check
;
do until (last.ssn);
set max;
by ssn;
ct = sum(ct,1);
end;
do until (last.ssn);
set max;
by ssn;
if ct > 1
then output check;
else output want;
end;
run;
I avoided to create a wide dataset, as you will do this for several variables; you can always create a wide report with PROC REPORT if needed.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.