Hi,
I have a dataset with an ID, two date vars, and a categorical variable:
ID Date1 Date2 Categ_var
1 1/1/2010 3/5/2010 abc
1 10/11/2010 10/12/2010 def
1 1/5/2011 5/10/2011 ghi
What I want is a summary dataset that will give me for each ID the min Date1, the max Date2, and the value of Categ_var that holds at the time of max(Date2), like this:
ID min_Date1 max_Date2 Categ_var
1 1/1/2010 5/10/2011 ghi
I was planning to do the min of Date1 and max of Date2 with a proc sql step, but I wasn't sure how to integrate the categ_var part.
Any help is much appreciated.
Thanks!
Sure.
data have;
input ID Date1 : mmddyy10. Date2 : mmddyy10. Categ_var $ Num_var;
format Date1 Date2 mmddyy10.;
cards;
1 1/1/2010 3/5/2010 abc 1
1 10/11/2010 10/12/2010 def 2
1 1/5/2011 5/10/2011 ghi 3
;
run;
proc sort data=have;by id date2;run;
data want;
set have;
by id;
if first.id then do;min_date1=999999;sum=0;end;
retain min_date1;
min_date1=min(min_date1,date1);sum+Num_var;
if last.id then do; max_date2=date2;output;end;
drop date1 date2 Num_var;
format min_date1 max_date2 mmddyy10.;
run;
Is it possible to have ties for the max date?
I think you may want something like
proc summary data=have nway;
class id;
var date1 date2;
output out=want (drop= _;) min(date1)=date1 max(date2)=date2 maxid(date2(categ_var))=Categ_var;
run;
data have;
input ID Date1 : mmddyy10. Date2 : mmddyy10. Categ_var $;
format Date1 Date2 mmddyy10.;
cards;
1 1/1/2010 3/5/2010 abc
1 10/11/2010 10/12/2010 def
1 1/5/2011 5/10/2011 ghi
;
run;
proc sort data=have;by id date2;run;
data want;
set have;
by id;
if first.id then min_date1=999999;
retain min_date1;
min_date1=min(min_date1,date1);
if last.id then do; max_date2=date2;output;end;
drop date1 date2;
format min_date1 max_date2 mmddyy10.;
run;
Thank you, this solution seems to work. If I had in addition a numerical var (Num_var) and I wanted the sum of num_var for each value of ID in addition to the rest of it, could I do that with just a slight alteration to this code?
Sure.
data have;
input ID Date1 : mmddyy10. Date2 : mmddyy10. Categ_var $ Num_var;
format Date1 Date2 mmddyy10.;
cards;
1 1/1/2010 3/5/2010 abc 1
1 10/11/2010 10/12/2010 def 2
1 1/5/2011 5/10/2011 ghi 3
;
run;
proc sort data=have;by id date2;run;
data want;
set have;
by id;
if first.id then do;min_date1=999999;sum=0;end;
retain min_date1;
min_date1=min(min_date1,date1);sum+Num_var;
if last.id then do; max_date2=date2;output;end;
drop date1 date2 Num_var;
format min_date1 max_date2 mmddyy10.;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.