DATA Step, Macro, Functions and more

Choosing categorical var that is true as of max of a date var

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 138
Accepted Solution

Choosing categorical var that is true as of max of a date var

Hi,

 

I have a dataset with an ID, two date vars, and a categorical variable:

 

ID     Date1                Date2            Categ_var

1      1/1/2010            3/5/2010          abc

1      10/11/2010        10/12/2010      def

1      1/5/2011             5/10/2011       ghi

 

What I want is a summary dataset that will give me for each ID the min Date1, the max Date2, and the value of Categ_var that holds at the time of max(Date2), like this:

 

ID   min_Date1  max_Date2  Categ_var

1     1/1/2010     5/10/2011    ghi

 

I was planning to do the min of Date1 and max of Date2 with a proc sql step, but I wasn't sure how to integrate the categ_var part. 

 

Any help is much appreciated.

 

Thanks!


Accepted Solutions
Solution
‎03-18-2016 09:14 AM
Super User
Posts: 10,041

Re: Choosing categorical var that is true as of max of a date var

Posted in reply to Walternate

Sure.

data have;
input ID     Date1  : mmddyy10.      Date2   : mmddyy10.    Categ_var $ Num_var;
format Date1      Date2    mmddyy10.;
cards;
1      1/1/2010            3/5/2010          abc 1
1      10/11/2010        10/12/2010      def 2
1      1/5/2011             5/10/2011       ghi 3
;
run;
proc sort data=have;by id date2;run;
data want;
 set have;
 by id;
 if first.id then do;min_date1=999999;sum=0;end;
 retain min_date1;
 min_date1=min(min_date1,date1);sum+Num_var;
 if last.id then do; max_date2=date2;output;end;
 drop date1 date2 Num_var;
 format min_date1 max_date2 mmddyy10.;
run;

View solution in original post


All Replies
Super User
Posts: 5,434

Re: Choosing categorical var that is true as of max of a date var

Posted in reply to Walternate
If your dag is not huge, you can join the min-max result with your original data.
Hope you don't have duplicates on date2.
Data never sleeps
Super User
Posts: 5,434

Re: Choosing categorical var that is true as of max of a date var

Posted in reply to Walternate
Damn autocorrect. "If your data is not huge", should it read...
Data never sleeps
Super User
Posts: 19,850

Re: Choosing categorical var that is true as of max of a date var

Posted in reply to Walternate

Is it possible to have ties for the max date?

 

 

Super User
Posts: 11,343

Re: Choosing categorical var that is true as of max of a date var

Posted in reply to Walternate

I think you may want something like

proc summary data=have nway;
   class id;
   var date1 date2;
   output out=want (drop= _;) min(date1)=date1 max(date2)=date2 maxid(date2(categ_var))=Categ_var;
run;
Super User
Posts: 10,041

Re: Choosing categorical var that is true as of max of a date var

Posted in reply to Walternate
data have;
input ID     Date1  : mmddyy10.      Date2   : mmddyy10.    Categ_var $;
format Date1      Date2    mmddyy10.;
cards;
1      1/1/2010            3/5/2010          abc
1      10/11/2010        10/12/2010      def
1      1/5/2011             5/10/2011       ghi
;
run;
proc sort data=have;by id date2;run;
data want;
 set have;
 by id;
 if first.id then min_date1=999999;
 retain min_date1;
 min_date1=min(min_date1,date1);
 if last.id then do; max_date2=date2;output;end;
 drop date1 date2;
 format min_date1 max_date2 mmddyy10.;
run;
Frequent Contributor
Posts: 138

Re: Choosing categorical var that is true as of max of a date var

Thank you, this solution seems to work. If I had in addition a numerical var (Num_var) and I wanted the sum of num_var for each value of ID in addition to the rest of it, could I do that with just a slight alteration to this code? 

Solution
‎03-18-2016 09:14 AM
Super User
Posts: 10,041

Re: Choosing categorical var that is true as of max of a date var

Posted in reply to Walternate

Sure.

data have;
input ID     Date1  : mmddyy10.      Date2   : mmddyy10.    Categ_var $ Num_var;
format Date1      Date2    mmddyy10.;
cards;
1      1/1/2010            3/5/2010          abc 1
1      10/11/2010        10/12/2010      def 2
1      1/5/2011             5/10/2011       ghi 3
;
run;
proc sort data=have;by id date2;run;
data want;
 set have;
 by id;
 if first.id then do;min_date1=999999;sum=0;end;
 retain min_date1;
 min_date1=min(min_date1,date1);sum+Num_var;
 if last.id then do; max_date2=date2;output;end;
 drop date1 date2 Num_var;
 format min_date1 max_date2 mmddyy10.;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 307 views
  • 2 likes
  • 5 in conversation