Hi,
I have a dataset with an ID, two date vars, and a categorical variable:
ID Date1 Date2 Categ_var
1 1/1/2010 3/5/2010 abc
1 10/11/2010 10/12/2010 def
1 1/5/2011 5/10/2011 ghi
What I want is a summary dataset that will give me for each ID the min Date1, the max Date2, and the value of Categ_var that holds at the time of max(Date2), like this:
ID min_Date1 max_Date2 Categ_var
1 1/1/2010 5/10/2011 ghi
I was planning to do the min of Date1 and max of Date2 with a proc sql step, but I wasn't sure how to integrate the categ_var part.
Any help is much appreciated.
Thanks!
Sure.
data have;
input ID Date1 : mmddyy10. Date2 : mmddyy10. Categ_var $ Num_var;
format Date1 Date2 mmddyy10.;
cards;
1 1/1/2010 3/5/2010 abc 1
1 10/11/2010 10/12/2010 def 2
1 1/5/2011 5/10/2011 ghi 3
;
run;
proc sort data=have;by id date2;run;
data want;
set have;
by id;
if first.id then do;min_date1=999999;sum=0;end;
retain min_date1;
min_date1=min(min_date1,date1);sum+Num_var;
if last.id then do; max_date2=date2;output;end;
drop date1 date2 Num_var;
format min_date1 max_date2 mmddyy10.;
run;
Is it possible to have ties for the max date?
I think you may want something like
proc summary data=have nway;
class id;
var date1 date2;
output out=want (drop= _;) min(date1)=date1 max(date2)=date2 maxid(date2(categ_var))=Categ_var;
run;
data have;
input ID Date1 : mmddyy10. Date2 : mmddyy10. Categ_var $;
format Date1 Date2 mmddyy10.;
cards;
1 1/1/2010 3/5/2010 abc
1 10/11/2010 10/12/2010 def
1 1/5/2011 5/10/2011 ghi
;
run;
proc sort data=have;by id date2;run;
data want;
set have;
by id;
if first.id then min_date1=999999;
retain min_date1;
min_date1=min(min_date1,date1);
if last.id then do; max_date2=date2;output;end;
drop date1 date2;
format min_date1 max_date2 mmddyy10.;
run;
Thank you, this solution seems to work. If I had in addition a numerical var (Num_var) and I wanted the sum of num_var for each value of ID in addition to the rest of it, could I do that with just a slight alteration to this code?
Sure.
data have;
input ID Date1 : mmddyy10. Date2 : mmddyy10. Categ_var $ Num_var;
format Date1 Date2 mmddyy10.;
cards;
1 1/1/2010 3/5/2010 abc 1
1 10/11/2010 10/12/2010 def 2
1 1/5/2011 5/10/2011 ghi 3
;
run;
proc sort data=have;by id date2;run;
data want;
set have;
by id;
if first.id then do;min_date1=999999;sum=0;end;
retain min_date1;
min_date1=min(min_date1,date1);sum+Num_var;
if last.id then do; max_date2=date2;output;end;
drop date1 date2 Num_var;
format min_date1 max_date2 mmddyy10.;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.