BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lousam
Obsidian | Level 7

Hello,

 

I am currently working on a longitudinal dataset that contains a lot of 0/1 variables (~90) . The dataset looks something like this (please note there is no pattern to the variable names):

 

ID     initial           mid1         final              PCI

100     0                 .                 1                    .

100     .                  0                .                     .

100     .                  .                 .                     1

200    1                  .                 0                    .

200    .                   .                 .                     1

300    0                  .                 .                     .

300     .                  .                 1                    .   

300     .                 0                  .                    .

300     .                 .                   .                    0    

 

I need to create a new 'summary' data set where that contains only one observation per ID. The dataset should look something like this: 

ID      initial           mid1         final              PCI

100     0                 0                1                    1

200     1                  .                0                    1

300     0                  0               1                    0

 

How can I do this?

 

The only way I can think of doing this would be to create new variables that sums the 0/1 values for each variable and outputs the last value (using retain and first. last.). Unfortunately, this will take too much time and will require creating a lot of new variables.  Is there a simpler way of doing this? I appreciate any help you can offer.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20
data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;

proc sql;
   create table want as
   select ID
         ,max(initial) as initial
         ,max(mid1) as mid1
         ,max(final) as final
         ,max(PCI) as PCI
   from have
   group by ID;
quit;

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20


data have;
infile cards truncover;
input ID      var_a        var_b         var_c             var_d;
cards;
100     0                 .                 1                    .
100     .                  0                .                     .
100     .                  .                 .                     1
200    1                  .                 0                    .
200    .                   .                 .                     1
300    0                  .                 .                     .
300     .                  .                 1                    .   
300     .                 0                  .                    .
300     .                 .                   .                    0    
;
proc means data = have nway noprint;
class id;
var var_a--var_d;
output out=want(drop=_:) max=;
run;
PeterClemmensen
Tourmaline | Level 20
data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;

proc sql;
   create table want as
   select ID
         ,max(initial) as initial
         ,max(mid1) as mid1
         ,max(final) as final
         ,max(PCI) as PCI
   from have
   group by ID;
quit;
Astounding
PROC Star

If you want the last value for each variable, it's easy.  

 

This requires that your data set is sorted by id:

 

data want;

update have (obs=0) have;

by id;

run;

 

This will not necessarily be the sum, or the max, or any statistic.  It will be the last value found, searching the observations in order.

kiranv_
Rhodochrosite | Level 12

if you know variable which you should not include and replace your dataset and libname from the code below

proc sql noprint;
select cat('max(',trim(name), ') as ', name) into :Val separated by ',' from  dictionary.columns 
where upcase(memname) ="HAVE"
and upcase(Libname) =  "WORK"  
and name ne "ID";
%put &val; 


proc sql;
create table want as 
select id, &val  from have
group by ID;
DWilson
Pyrite | Level 9

I'm a big fan of by processing. Something like this:

 

data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;
run;

 

data one(keep=ID var1 var2 var3 var4);
retain var1 var2 var3 var4;
set have;
by ID;
if first.id then do;
var1=initial;
var2=mid1;
var3=final;
var4=PCI;
end;
else do;
if var1<initial then var1=initial;
if var2<mid1 then var2=mid1;
if var3<final then var3=final;
if var4<PCI then var4=PCI;
end;
if last.ID then output;
run;
proc print data=one;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2930 views
  • 8 likes
  • 6 in conversation