SAS Programming

christineford · Posted 10-14-2021 09:01 AM

Hello,

I am working with a dataset that contains demographics data that does not always align (i.e. someone can have multiple records and they do not always match). I am trying to code my program so that I can count how many times someone is reported as sex=female or sex=male and then I need to make a new variable for sex based on which outcome occurs the most.

For example, this is kind of like what I have

SSN	Sex
111111	M
111111111	M
111111111	M
111111111	F

And then I would want something like this (could look different, but this gist):

SSN	Sex1	N1	Sex2	N2	NewSex
111111111	M	3	F	1	M

Does anyone have any ideas on how I can achieve this?

TIA

Ksharp · Posted 10-14-2021 09:33 AM

proc freq data=sashelp.heart noprint;
table BP_Status*sex/out=have list;
run;

proc sql noprint;
select max(n) into : n
 from (select count(*) as n from have group by BP_Status);
quit;

proc summary data=have;
by BP_Status ;
output out=temp(drop=_:) idgroup(out[&n] (sex count)=);
run;

data want;
 set temp;
 array x{*} sex:;
 array y{*} count:;
 new=x{  whichn(max(of y{*}),of y{*})  };
run;

View solution in original post

Ksharp · Posted 10-14-2021 09:33 AM

proc freq data=sashelp.heart noprint;
table BP_Status*sex/out=have list;
run;

proc sql noprint;
select max(n) into : n
 from (select count(*) as n from have group by BP_Status);
quit;

proc summary data=have;
by BP_Status ;
output out=temp(drop=_:) idgroup(out[&n] (sex count)=);
run;

data want;
 set temp;
 array x{*} sex:;
 array y{*} count:;
 new=x{  whichn(max(of y{*}),of y{*})  };
run;

christineford · Posted 10-14-2021 10:07 AM

This was perfect! Thank you

mkeintz · Posted 10-14-2021 09:36 AM

@christineford

What if there is a tie?
Your sample seems to assume that SSN 111111 (first row) is the same person as SSN 1111111111 rows 2 through 4). Is that just a typo?
Is your data already sorted by ssn?
Do you want to reduce multiple records to one as in your example? This would be a problem if your original records have variables besides SEX that vary over a set of matching SSN's.

Assuming the data are sorted, and there are no other variables that vary per #4, then

data want;
  do until (last.ssn);
    set have;
    by ssn;
    if sex='M' then nmales=sum(nmales,1); else
    if sex='F' then nfemales=sum(nfemales,1);
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

christineford · Posted 10-14-2021 09:42 AM

What if there is a tie? If there is a tie I need to check another database
Your sample seems to assume that SSN 111111 (first row) is the same person as SSN 1111111111 rows 2 through 4). Is that just a typo? Yes this was a typo - it should be all the same SSN for the example
Is your data already sorted by ssn? Yes
Do you want to reduce multiple records to one as in your example? This would be a problem if your original records have variables besides SEX that vary over a set of matching SSN's. -- I need to check multiple variables that have the same issue (take the most recurring value for each variable I need). My idea was to do this in segments to make it less complicated and then create a new dataset with one line per person with all of the necessary demographic information

Kurt_Bremser · Posted 10-14-2021 09:58 AM

data have;
input SSN $ Sex :$1.;
datalines;
111111111 M
111111111 M
111111111 M
111111111 F
222222222 F
222222222 F
333333333 M
333333333 F
;

proc sql;
create table max as
select
  ssn,
  sex
from sum
group by ssn
having count = max(count);
quit;

data
  want (drop=ct)
  check
;
do until (last.ssn);
  set max;
  by ssn;
  ct = sum(ct,1);
end;
do until (last.ssn);
  set max;
  by ssn;
  if ct > 1
  then output check;
  else output want;
end;
run;

I avoided to create a wide dataset, as you will do this for several variables; you can always create a wide report with PROC REPORT if needed.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

SAS Programming

Getting frequency counts for different values of a variable by id

Re: Getting frequency counts for different values of a variable by id

Re: Getting frequency counts for different values of a variable by id

Re: Getting frequency counts for different values of a variable by id

Re: Getting frequency counts for different values of a variable by id

Re: Getting frequency counts for different values of a variable by id

Re: Getting frequency counts for different values of a variable by id

Conditional replacement of a variable values

Create variable that counts, accumulates and displays events at differ...

Transforming the Frequency of Time Series Data

Comparing counts of different values of a variable present in two data...

Get Frequency of all combinations of dummy variables

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...