BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lousam
Obsidian | Level 7

Hello,

 

I am currently working on a longitudinal dataset that contains a lot of 0/1 variables (~90) . The dataset looks something like this (please note there is no pattern to the variable names):

 

ID     initial           mid1         final              PCI

100     0                 .                 1                    .

100     .                  0                .                     .

100     .                  .                 .                     1

200    1                  .                 0                    .

200    .                   .                 .                     1

300    0                  .                 .                     .

300     .                  .                 1                    .   

300     .                 0                  .                    .

300     .                 .                   .                    0    

 

I need to create a new 'summary' data set where that contains only one observation per ID. The dataset should look something like this: 

ID      initial           mid1         final              PCI

100     0                 0                1                    1

200     1                  .                0                    1

300     0                  0               1                    0

 

How can I do this?

 

The only way I can think of doing this would be to create new variables that sums the 0/1 values for each variable and outputs the last value (using retain and first. last.). Unfortunately, this will take too much time and will require creating a lot of new variables.  Is there a simpler way of doing this? I appreciate any help you can offer.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20
data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;

proc sql;
   create table want as
   select ID
         ,max(initial) as initial
         ,max(mid1) as mid1
         ,max(final) as final
         ,max(PCI) as PCI
   from have
   group by ID;
quit;

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20


data have;
infile cards truncover;
input ID      var_a        var_b         var_c             var_d;
cards;
100     0                 .                 1                    .
100     .                  0                .                     .
100     .                  .                 .                     1
200    1                  .                 0                    .
200    .                   .                 .                     1
300    0                  .                 .                     .
300     .                  .                 1                    .   
300     .                 0                  .                    .
300     .                 .                   .                    0    
;
proc means data = have nway noprint;
class id;
var var_a--var_d;
output out=want(drop=_:) max=;
run;
PeterClemmensen
Tourmaline | Level 20
data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;

proc sql;
   create table want as
   select ID
         ,max(initial) as initial
         ,max(mid1) as mid1
         ,max(final) as final
         ,max(PCI) as PCI
   from have
   group by ID;
quit;
Astounding
PROC Star

If you want the last value for each variable, it's easy.  

 

This requires that your data set is sorted by id:

 

data want;

update have (obs=0) have;

by id;

run;

 

This will not necessarily be the sum, or the max, or any statistic.  It will be the last value found, searching the observations in order.

kiranv_
Rhodochrosite | Level 12

if you know variable which you should not include and replace your dataset and libname from the code below

proc sql noprint;
select cat('max(',trim(name), ') as ', name) into :Val separated by ',' from  dictionary.columns 
where upcase(memname) ="HAVE"
and upcase(Libname) =  "WORK"  
and name ne "ID";
%put &val; 


proc sql;
create table want as 
select id, &val  from have
group by ID;
DWilson
Pyrite | Level 9

I'm a big fan of by processing. Something like this:

 

data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;
run;

 

data one(keep=ID var1 var2 var3 var4);
retain var1 var2 var3 var4;
set have;
by ID;
if first.id then do;
var1=initial;
var2=mid1;
var3=final;
var4=PCI;
end;
else do;
if var1<initial then var1=initial;
if var2<mid1 then var2=mid1;
if var3<final then var3=final;
if var4<PCI then var4=PCI;
end;
if last.ID then output;
run;
proc print data=one;
run;

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 4417 views
  • 8 likes
  • 6 in conversation