## Outputting the maximum value for each variable (by ID) into a new dataset

Hello,

I am currently working on a longitudinal dataset that contains a lot of 0/1 variables (~90) . The dataset looks something like this (please note there is no pattern to the variable names):

ID     initial           mid1         final              PCI

100     0                 .                 1                    .

100     .                  0                .                     .

100     .                  .                 .                     1

200    1                  .                 0                    .

200    .                   .                 .                     1

300    0                  .                 .                     .

300     .                  .                 1                    .

300     .                 0                  .                    .

300     .                 .                   .                    0

I need to create a new 'summary' data set where that contains only one observation per ID. The dataset should look something like this:

ID      initial           mid1         final              PCI

100     0                 0                1                    1

200     1                  .                0                    1

300     0                  0               1                    0

How can I do this?

The only way I can think of doing this would be to create new variables that sums the 0/1 values for each variable and outputs the last value (using retain and first. last.). Unfortunately, this will take too much time and will require creating a lot of new variables.  Is there a simpler way of doing this? I appreciate any help you can offer.

1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Outputting the maximum value for each variable (by ID) into a new dataset

``````data have;
input ID \$ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;

proc sql;
create table want as
select ID
,max(initial) as initial
,max(mid1) as mid1
,max(final) as final
,max(PCI) as PCI
from have
group by ID;
quit;``````
5 REPLIES 5  novinosrin
Tourmaline | Level 20

## Re: Outputting the maximum value for each variable (by ID) into a new dataset

``````

data have;
infile cards truncover;
input ID      var_a        var_b         var_c             var_d;
cards;
100     0                 .                 1                    .
100     .                  0                .                     .
100     .                  .                 .                     1
200    1                  .                 0                    .
200    .                   .                 .                     1
300    0                  .                 .                     .
300     .                  .                 1                    .
300     .                 0                  .                    .
300     .                 .                   .                    0
;
proc means data = have nway noprint;
class id;
var var_a--var_d;
output out=want(drop=_:) max=;
run;``````

## Re: Outputting the maximum value for each variable (by ID) into a new dataset

``````data have;
input ID \$ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;

proc sql;
create table want as
select ID
,max(initial) as initial
,max(mid1) as mid1
,max(final) as final
,max(PCI) as PCI
from have
group by ID;
quit;``````

## Re: Outputting the maximum value for each variable (by ID) into a new dataset

If you want the last value for each variable, it's easy.

This requires that your data set is sorted by id:

data want;

update have (obs=0) have;

by id;

run;

This will not necessarily be the sum, or the max, or any statistic.  It will be the last value found, searching the observations in order.

## Re: Outputting the maximum value for each variable (by ID) into a new dataset

if you know variable which you should not include and replace your dataset and libname from the code below

``````proc sql noprint;
select cat('max(',trim(name), ') as ', name) into :Val separated by ',' from  dictionary.columns
where upcase(memname) ="HAVE"
and upcase(Libname) =  "WORK"
and name ne "ID";
%put &val;

proc sql;
create table want as
select id, &val  from have
group by ID;``````

## Re: Outputting the maximum value for each variable (by ID) into a new dataset

I'm a big fan of by processing. Something like this:

data have;
input ID \$ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;
run;

data one(keep=ID var1 var2 var3 var4);
retain var1 var2 var3 var4;
set have;
by ID;
if first.id then do;
var1=initial;
var2=mid1;
var3=final;
var4=PCI;
end;
else do;
if var1<initial then var1=initial;
if var2<mid1 then var2=mid1;
if var3<final then var3=final;
if var4<PCI then var4=PCI;
end;
if last.ID then output;
run;
proc print data=one;
run;

Discussion stats
• 5 replies
• 2642 views
• 8 likes
• 6 in conversation