I have data like table-1 and I want it like table-2
I want maximum date for all the records among the one subject.
Table-1
subject | date | |
1 | 3/8/2022 | 0 |
2 | 3/9/2022 | 0 |
2 | 3/10/2022 | 1 |
2 | 3/11/2022 | 1 |
3 | 3/12/2022 | 0 |
Table-2
subject | date | |
1 | 3/8/2022 | 0 |
2 | 3/11/2022 | 0 |
2 | 3/11/2022 | 1 |
2 | 3/11/2022 | 1 |
3 | 3/12/2022 | 0 |
Or
data have;
input subject date :mmddyy10. dummy;
format date mmddyy10.;
datalines;
1 3/8/2022 0
2 3/9/2022 0
2 3/10/2022 1
2 3/11/2022 1
3 3/12/2022 0
;
data want(drop = d);
do _N_ = 1 by 1 until (last.subject);
set have;
by subject;
if date > d then d = date;
end;
do _N_ = 1 to _N_;
set have;
date = d;
output;
end;
run;
data have;
input subject date :mmddyy10. dummy;
format date mmddyy10.;
datalines;
1 3/8/2022 0
2 3/9/2022 0
2 3/10/2022 1
2 3/11/2022 1
3 3/12/2022 0
;
proc sql;
create table want as
select subject
, max(date) as date format = mmddyy10.
, dummy
from have
group by subject
;
quit;
Result
subject date dummy 1 03/08/2022 0 2 03/11/2022 1 2 03/11/2022 0 2 03/11/2022 1 3 03/12/2022 0
Hi KalaBhairava,
Here's a quick solution:
proc sort data = table1 out=temp;
by subject descending date;
run;
data table2 (drop = h_d);
set temp;
by subject;
retain h_d;
if first.subject then h_d = date;
date = h_d;
run;
HTH,
Cheers.
Or
data have;
input subject date :mmddyy10. dummy;
format date mmddyy10.;
datalines;
1 3/8/2022 0
2 3/9/2022 0
2 3/10/2022 1
2 3/11/2022 1
3 3/12/2022 0
;
data want(drop = d);
do _N_ = 1 by 1 until (last.subject);
set have;
by subject;
if date > d then d = date;
end;
do _N_ = 1 to _N_;
set have;
date = d;
output;
end;
run;
If the data are sorted by subject:
data want;
set have (in=firstpass) have (in=secondpass);
by subject;
retain maxdate;
if first.subject then maxdate=date;
else if firstpass then maxdate=max(date,maxdate);
if secondpass;
run;
But if the data are not sorted, and sorting is too expensive then you should use @FreelanceReinh 's suggestion. (I've struck out my original content below - it was wrong).
data want; set have; if _n_=1 then do; if 0 then set have (rename=(date=maxdate)); declare hash h (dataset:'have (keep=subject date rename=(date=maxdate))',multidata:'Y',ordered:'D'); h.definekey('subject'); h.definedata('maxdate'); h.definedone(); end; h.find(); run;
Thanks, @mkeintz. I always like the elegance of your trademark "firstpass-secondpass" solutions.
I think this technique could even help in the "not sorted" case to support the hash object approach. Here's my attempt to combine the two:
data want(drop=rc);
set have (in=firstpass) have (in=secondpass);
if _n_=1 then do;
dcl hash h();
h.definekey('subject');
h.definedata('maxdate');
h.definedone();
maxdate=.;
end;
rc=h.find();
if firstpass & date>maxdate then h.replace(key: subject, data: date);
if secondpass;
format maxdate mmddyy10.;
run;
I appreciate your positive comment about the firstpass/secondpass paradigm.
But if the data is sorted, there is likely an efficiency difference compared to your suggestion. When you use it with a BY statement, as in
data want;
set have (in=firstpass) have (in=secondpass);
by subject;
.....
the data are interleaved, meaning a given ID group is read twice before proceeding to the next ID group. It'll be more efficient because that BY group data is likely sitting in memory allocated as a disk buffer after the firstpass. So the secondpass likely reads from memory rather than over the disk channel from external storage.
But if you have to do it without the BY statement, as in:
data want;
set have (in=firstpass) have (in=secondpass);
.... other code ....
it reads the ENTIRE dataset for firstpass before rereading during secondpass. You'll almost certainly be forced to implement a lot more disk activity.
Of course, my solution for unsorted data using the hash object statement declare hash h (dataset:'have'.....) would have the same disadvantage.
In that case, your solution would likely be a bit better because your hash object would take a lot less memory, having only one "row" per SUBJECT.
Thanks. Very good point about reading data from memory vs. disk. Yes, for sorted data I wouldn't have suggested this approach.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Follow along as SAS’ Robert Blanchard explains three aspects of autotuning in a deep learning context: globalized search, localized search and an in parallel method using SAS.
Find more tutorials on the SAS Users YouTube channel.