Hello,
I am working with a dataset that contains demographics data that does not always align (i.e. someone can have multiple records and they do not always match). I am trying to code my program so that I can count how many times someone is reported as sex=female or sex=male and then I need to make a new variable for sex based on which outcome occurs the most.
For example, this is kind of like what I have
SSN | Sex |
111111 | M |
111111111 | M |
111111111 | M |
111111111 | F |
And then I would want something like this (could look different, but this gist):
SSN | Sex1 | N1 | Sex2 | N2 | NewSex |
111111111 | M | 3 | F | 1 | M |
Does anyone have any ideas on how I can achieve this?
TIA
proc freq data=sashelp.heart noprint;
table BP_Status*sex/out=have list;
run;
proc sql noprint;
select max(n) into : n
from (select count(*) as n from have group by BP_Status);
quit;
proc summary data=have;
by BP_Status ;
output out=temp(drop=_:) idgroup(out[&n] (sex count)=);
run;
data want;
set temp;
array x{*} sex:;
array y{*} count:;
new=x{ whichn(max(of y{*}),of y{*}) };
run;
proc freq data=sashelp.heart noprint;
table BP_Status*sex/out=have list;
run;
proc sql noprint;
select max(n) into : n
from (select count(*) as n from have group by BP_Status);
quit;
proc summary data=have;
by BP_Status ;
output out=temp(drop=_:) idgroup(out[&n] (sex count)=);
run;
data want;
set temp;
array x{*} sex:;
array y{*} count:;
new=x{ whichn(max(of y{*}),of y{*}) };
run;
Assuming the data are sorted, and there are no other variables that vary per #4, then
data want;
do until (last.ssn);
set have;
by ssn;
if sex='M' then nmales=sum(nmales,1); else
if sex='F' then nfemales=sum(nfemales,1);
end;
run;
data have;
input SSN $ Sex :$1.;
datalines;
111111111 M
111111111 M
111111111 M
111111111 F
222222222 F
222222222 F
333333333 M
333333333 F
;
proc sql;
create table max as
select
ssn,
sex
from sum
group by ssn
having count = max(count);
quit;
data
want (drop=ct)
check
;
do until (last.ssn);
set max;
by ssn;
ct = sum(ct,1);
end;
do until (last.ssn);
set max;
by ssn;
if ct > 1
then output check;
else output want;
end;
run;
I avoided to create a wide dataset, as you will do this for several variables; you can always create a wide report with PROC REPORT if needed.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.