Solved: Outputting the maximum value for each variable (by ID) into a new data...

lousam · Posted 01-21-2019 12:45 PM

Hello,

I am currently working on a longitudinal dataset that contains a lot of 0/1 variables (~90) . The dataset looks something like this (please note there is no pattern to the variable names):

ID initial mid1 final PCI

100 0 . 1 .

100 . 0 . .

100 . . . 1

200 1 . 0 .

200 . . . 1

300 0 . . .

300 . . 1 .

300 . 0 . .

300 . . . 0

I need to create a new 'summary' data set where that contains only one observation per ID. The dataset should look something like this:

ID initial mid1 final PCI

100 0 0 1 1

200 1 . 0 1

300 0 0 1 0

How can I do this?

The only way I can think of doing this would be to create new variables that sums the 0/1 values for each variable and outputs the last value (using retain and first. last.). Unfortunately, this will take too much time and will require creating a lot of new variables. Is there a simpler way of doing this? I appreciate any help you can offer.

PeterClemmensen · Posted 01-21-2019 01:30 PM

data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;

proc sql;
   create table want as
   select ID
         ,max(initial) as initial
         ,max(mid1) as mid1
         ,max(final) as final
         ,max(PCI) as PCI
   from have
   group by ID;
quit;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

novinosrin · Posted 01-21-2019 12:57 PM



data have;
infile cards truncover;
input ID      var_a        var_b         var_c             var_d;
cards;
100     0                 .                 1                    .
100     .                  0                .                     .
100     .                  .                 .                     1
200    1                  .                 0                    .
200    .                   .                 .                     1
300    0                  .                 .                     .
300     .                  .                 1                    .   
300     .                 0                  .                    .
300     .                 .                   .                    0    
;
proc means data = have nway noprint;
class id;
var var_a--var_d;
output out=want(drop=_:) max=;
run;

PeterClemmensen · Posted 01-21-2019 01:30 PM

data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;

proc sql;
   create table want as
   select ID
         ,max(initial) as initial
         ,max(mid1) as mid1
         ,max(final) as final
         ,max(PCI) as PCI
   from have
   group by ID;
quit;

The DATA to DATA Step Macro
Blog: SASnrd

Astounding · Posted 01-21-2019 01:48 PM

If you want the last value for each variable, it's easy.

This requires that your data set is sorted by id:

data want;

update have (obs=0) have;

by id;

run;

This will not necessarily be the sum, or the max, or any statistic. It will be the last value found, searching the observations in order.

kiranv_ · Posted 01-21-2019 01:49 PM

if you know variable which you should not include and replace your dataset and libname from the code below

proc sql noprint;
select cat('max(',trim(name), ') as ', name) into :Val separated by ',' from  dictionary.columns 
where upcase(memname) ="HAVE"
and upcase(Libname) =  "WORK"  
and name ne "ID";
%put &val; 


proc sql;
create table want as 
select id, &val  from have
group by ID;

DWilson · Posted 02-01-2019 02:58 PM

I'm a big fan of by processing. Something like this:

data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;
run;

data one(keep=ID var1 var2 var3 var4);
retain var1 var2 var3 var4;
set have;
by ID;
if first.id then do;
var1=initial;
var2=mid1;
var3=final;
var4=PCI;
end;
else do;
if var1<initial then var1=initial;
if var2<mid1 then var2=mid1;
if var3<final then var3=final;
if var4<PCI then var4=PCI;
end;
if last.ID then output;
run;
proc print data=one;
run;

Outputting the maximum value for each variable (by ID) into a new dataset

Re: Outputting the maximum value for each variable (by ID) into a new dataset

Re: Outputting the maximum value for each variable (by ID) into a new dataset

Re: Outputting the maximum value for each variable (by ID) into a new dataset

Re: Outputting the maximum value for each variable (by ID) into a new dataset

Re: Outputting the maximum value for each variable (by ID) into a new dataset

Re: Outputting the maximum value for each variable (by ID) into a new dataset

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!