Hello,
I am currently working on a longitudinal dataset that contains a lot of 0/1 variables (~90) . The dataset looks something like this (please note there is no pattern to the variable names):
ID initial mid1 final PCI
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
I need to create a new 'summary' data set where that contains only one observation per ID. The dataset should look something like this:
ID initial mid1 final PCI
100 0 0 1 1
200 1 . 0 1
300 0 0 1 0
How can I do this?
The only way I can think of doing this would be to create new variables that sums the 0/1 values for each variable and outputs the last value (using retain and first. last.). Unfortunately, this will take too much time and will require creating a lot of new variables. Is there a simpler way of doing this? I appreciate any help you can offer.
data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;
proc sql;
create table want as
select ID
,max(initial) as initial
,max(mid1) as mid1
,max(final) as final
,max(PCI) as PCI
from have
group by ID;
quit;
data have;
infile cards truncover;
input ID var_a var_b var_c var_d;
cards;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;
proc means data = have nway noprint;
class id;
var var_a--var_d;
output out=want(drop=_:) max=;
run;
data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;
proc sql;
create table want as
select ID
,max(initial) as initial
,max(mid1) as mid1
,max(final) as final
,max(PCI) as PCI
from have
group by ID;
quit;
If you want the last value for each variable, it's easy.
This requires that your data set is sorted by id:
data want;
update have (obs=0) have;
by id;
run;
This will not necessarily be the sum, or the max, or any statistic. It will be the last value found, searching the observations in order.
if you know variable which you should not include and replace your dataset and libname from the code below
proc sql noprint;
select cat('max(',trim(name), ') as ', name) into :Val separated by ',' from dictionary.columns
where upcase(memname) ="HAVE"
and upcase(Libname) = "WORK"
and name ne "ID";
%put &val;
proc sql;
create table want as
select id, &val from have
group by ID;
I'm a big fan of by processing. Something like this:
data have;
input ID $ initial mid1 final PCI;
datalines;
100 0 . 1 .
100 . 0 . .
100 . . . 1
200 1 . 0 .
200 . . . 1
300 0 . . .
300 . . 1 .
300 . 0 . .
300 . . . 0
;
run;
data one(keep=ID var1 var2 var3 var4);
retain var1 var2 var3 var4;
set have;
by ID;
if first.id then do;
var1=initial;
var2=mid1;
var3=final;
var4=PCI;
end;
else do;
if var1<initial then var1=initial;
if var2<mid1 then var2=mid1;
if var3<final then var3=final;
if var4<PCI then var4=PCI;
end;
if last.ID then output;
run;
proc print data=one;
run;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.