I have dataset in the form
DATA details;
input id doj$;
datalines;
1 11/06/2020
1 18/07/2020
1 18/07/2020
2 12/06/2020
2 19/06/2020
3 13/06/2020
4 14/06/2020
5 15/06/2020
5 15/06/2020
;
Run;
(this dataset is sorted by ID and doj).
I would like an output table that gives:
ID | doj |
1 | 18/07/2020 |
1 | 18/07/2020 |
2 | 19/06/2020 |
3 | 13/06/2020 |
4 | 14/06/2020 |
5 | 15/06/2020 |
5 | 15/06/2020 |
It looks at each unique ID, and then takes the latest one based on "doj" and takes multiple records if the "latest" is repeated. This is a simple example, but there can be more columns in the source table, and also many more dates for each id.
DATA details;
input @1 id $2. @3 doj ddmmyy10. ;
format doj ddmmyy10.;
datalines;
1 11/06/2020
1 18/07/2020
1 18/07/2020
2 12/06/2020
2 19/06/2020
3 13/06/2020
4 14/06/2020
5 15/06/2020
5 15/06/2020
;
Run;
proc sql;
create table Want as
select A.*
from details as A
inner join
(select id
,max(doj) as doj_max
from details
group by id
) as B
on A.id = B.id
and A.doj = B.doj_max
;
run;
Do not store dates as character strings, store them as SAS date values.
DATA details;
input id doj :ddmmyy10.;
format doj yymmdd10.;
datalines;
1 11/06/2020
1 18/07/2020
1 18/07/2020
2 12/06/2020
2 19/06/2020
3 13/06/2020
4 14/06/2020
5 15/06/2020
5 15/06/2020
;
data want;
do until (last.id);
set details;
by id;
maxdate = max(doj,maxdate);
end;
do until (last.id);
set details;
by id;
if doj = maxdate then output;
end;
drop maxdate;
run;
@yellowredyellow wrote:
I have dataset in the form
DATA details; input id doj$; datalines; 1 11/06/2020 1 18/07/2020 1 18/07/2020 2 12/06/2020 2 19/06/2020 3 13/06/2020 4 14/06/2020 5 15/06/2020 5 15/06/2020 ; Run;
(this dataset is sorted by ID and doj).
I would like an output table that gives:
ID doj 1 18/07/2020 1 18/07/2020 2 19/06/2020 3 13/06/2020 4 14/06/2020 5 15/06/2020 5 15/06/2020
It looks at each unique ID, and then takes the latest one based on "doj" and takes multiple records if the "latest" is repeated. This is a simple example, but there can be more columns in the source table, and also many more dates for each id.
Did you look at the result from running your example data step code? Proc print for that data set yields:
id | doj |
---|---|
1 | 11/06/20 |
1 | 18/07/20 |
1 | 18/07/20 |
2 | 12/06/20 |
2 | 19/06/20 |
3 | 13/06/20 |
4 | 14/06/20 |
5 | 15/06/20 |
5 | 15/06/20 |
So, not only are you using CHARACTER values for something that should be a SAS date value, you don't even read the entire value so there is no way, with your data step, to generate the desired output as you have lost the last to characters of the year.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.