BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
KalaBhairava
Quartz | Level 8

I have data like table-1 and I want it like table-2

I want maximum date for all the records among the one subject.

 

Table-1

subject date
1 3/8/2022 0
2 3/9/2022 0
2 3/10/2022 1
2 3/11/2022 1
3 3/12/2022 0

Table-2

subject date
1 3/8/2022 0
2 3/11/2022 0
2 3/11/2022 1
2 3/11/2022 1
3 3/12/2022 0
1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Or

 

data have;
input subject date :mmddyy10. dummy;
format date mmddyy10.;
datalines;
1 3/8/2022  0 
2 3/9/2022  0 
2 3/10/2022 1 
2 3/11/2022 1 
3 3/12/2022 0 
;

data want(drop = d);
   do _N_ = 1 by 1 until (last.subject);
      set have;
      by subject;
      if date > d then d = date;
   end;
   do _N_ = 1 to _N_;
      set have;
      date = d;
      output;
   end;
run;

View solution in original post

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20
data have;
input subject date :mmddyy10. dummy;
format date mmddyy10.;
datalines;
1 3/8/2022  0 
2 3/9/2022  0 
2 3/10/2022 1 
2 3/11/2022 1 
3 3/12/2022 0 
;

proc sql;
   create table want as
   select subject
        , max(date) as date format = mmddyy10.
        , dummy
   from have
   group by subject
   ;
quit;

 

Result

 

subject  date        dummy
1        03/08/2022  0 
2        03/11/2022  1 
2        03/11/2022  0 
2        03/11/2022  1 
3        03/12/2022  0 
Anne_A
Quartz | Level 8

Hi KalaBhairava,

Here's a quick solution:

proc sort data = table1 out=temp;
	by subject descending date;
run;

data table2 (drop = h_d);
	set temp;
	by subject;
	retain h_d;
	if first.subject then h_d = date;
	date = h_d;
run;

HTH,

Cheers.

PeterClemmensen
Tourmaline | Level 20

Or

 

data have;
input subject date :mmddyy10. dummy;
format date mmddyy10.;
datalines;
1 3/8/2022  0 
2 3/9/2022  0 
2 3/10/2022 1 
2 3/11/2022 1 
3 3/12/2022 0 
;

data want(drop = d);
   do _N_ = 1 by 1 until (last.subject);
      set have;
      by subject;
      if date > d then d = date;
   end;
   do _N_ = 1 to _N_;
      set have;
      date = d;
      output;
   end;
run;
mkeintz
PROC Star

If the data are sorted by subject:

 

data want;
  set have (in=firstpass) have (in=secondpass);
  by subject;
  retain maxdate;
  if first.subject then maxdate=date;
  else if firstpass then maxdate=max(date,maxdate);
  if secondpass;
run;

But if the data are not sorted, and sorting is too expensive then you should use @FreelanceReinh 's suggestion.   (I've struck out my original content below - it was wrong).

 

data want;
  set have;
  if _n_=1 then do;
    if 0 then set have (rename=(date=maxdate));
    declare hash h (dataset:'have (keep=subject date rename=(date=maxdate))',multidata:'Y',ordered:'D');
      h.definekey('subject');
      h.definedata('maxdate');
      h.definedone();
  end;
  h.find();
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
FreelanceReinh
Jade | Level 19

Thanks, @mkeintz. I always like the elegance of your trademark "firstpass-secondpass" solutions.

 

I think this technique could even help in the "not sorted" case to support the hash object approach. Here's my attempt to combine the two:

data want(drop=rc);
  set have (in=firstpass) have (in=secondpass);
  if _n_=1 then do;
    dcl hash h();
    h.definekey('subject');
    h.definedata('maxdate');
    h.definedone();
    maxdate=.;
  end;
  rc=h.find();
  if firstpass & date>maxdate then h.replace(key: subject, data: date);
  if secondpass;
  format maxdate mmddyy10.;
run;
mkeintz
PROC Star

@FreelanceReinh 

 

I appreciate your positive comment about the firstpass/secondpass paradigm.

 

But if the data is sorted, there is likely an efficiency difference compared to your suggestion.  When you use it with a BY statement, as in

data want;
    set have (in=firstpass) have (in=secondpass);
    by subject;
    .....

the data are interleaved, meaning a given ID group is read twice before proceeding to the next ID group.   It'll be more efficient because that BY group data is likely sitting in memory allocated as a disk buffer after the firstpass. So the secondpass likely reads from memory rather than over the disk channel from external storage.

 

But if you have to do it without the BY statement, as in: 

data want;
    set have (in=firstpass) have (in=secondpass);
    .... other code ....

it reads the ENTIRE dataset for firstpass before rereading during secondpass.  You'll almost certainly be forced to implement a lot more disk activity.

 

Of course, my solution for unsorted data using the hash object statement declare hash h (dataset:'have'.....) would have the same disadvantage.

 

In that case, your solution would likely be a bit better because your hash object would take a lot less memory, having only one "row" per SUBJECT.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
FreelanceReinh
Jade | Level 19

Thanks. Very good point about reading data from memory vs. disk. Yes, for sorted data I wouldn't have suggested this approach.

 

Spoiler
Actually, my main motivation for creating this solution was to "rescue" your hash object idea, which I think is unlikely to work in the original form because the "ordered:'D'" argument tag refers to the key values only, i.e., won't sort the dates (or has this changed in a SAS release newer than 9.4M5?). Adding 'maxdate' to the key portion of h would hamper the intended lookup based only on subject.

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Autotuning Deep Learning Models Using SAS

Follow along as SAS’ Robert Blanchard explains three aspects of autotuning in a deep learning context: globalized search, localized search and an in parallel method using SAS.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 2572 views
  • 3 likes
  • 5 in conversation