Hello everyone,
I have a dataset with millions of records. What I would like to do is keep the most recent date by id unless
the other 4 rating variables are more complete. Any assistance would be appreciated .This is the sample dataset:
id |
date |
rate_1 |
rate_2 |
rate_3 |
rate_4 |
100 |
3/3/2023 |
1 |
1 |
1 |
1 |
100 |
2/27/2023 |
1 |
1 |
1 |
1 |
100 |
1/21/2023 |
1 |
1 |
1 |
1 |
100 |
12/28/2022 |
1 |
1 |
1 |
1 |
200 |
3/4/2023 |
. |
. |
. |
. |
200 |
1/21/2023 |
1 |
1 |
1 |
1 |
and this is what I would like to end up with :
id |
date |
rate_1 |
rate_2 |
rate_3 |
rate_4 |
100 |
3/3/2023 |
1 |
1 |
1 |
1 |
200 |
1/21/2023 |
1 |
1 |
1 |
1 |
The code for the sample dataset is:
data test1;
infile datalines delimiter = ',';
input id $ date:mmddyy10. rate_1 rate_2 rate_3 rate_4 ;
format date mmddyy10.;
datalines;
100,03/03/2023,1,1,1,1,
100,02/27/2023,1,1,1,1,
100,01/21/2023,1,1,1,1,
100,12/28/2022,1,1,1,1,
200,03/04/2023,.,.,.,.,
200,01/21/2023,1,1,1,1
;
run;
If it's not too big to sort, you could compute N and then sort the data by ID N Date, and grab the last record per ID. Something like:
data test2 ;
set test1 ;
n=n(of rate: ) ;
run ;
proc sort data=test2 ;
by id n date ;
run ;
data want ;
set test2 ;
by id n date ;
if last.id ;
drop n ;
run ;
If it's not too big to sort, you could compute N and then sort the data by ID N Date, and grab the last record per ID. Something like:
data test2 ;
set test1 ;
n=n(of rate: ) ;
run ;
proc sort data=test2 ;
by id n date ;
run ;
data want ;
set test2 ;
by id n date ;
if last.id ;
drop n ;
run ;
Hello @luvscandy27,
Alternatively, you can use a single PROC SQL step:
proc sql;
create table want(drop=c) as
select *, 1e6*n(rate_1,rate_2,rate_3,rate_4)+date as c from test1
group by id
having c=max(c);
quit;
I believe you can achieve this by using proc sort and last.id-
Try this-
proc sort data=test1; by id date; run; data tmp; set test1; by id; if sum(of rate_:) = 4; run; proc sort data=tmp; by id ; run; data tmp1; set tmp; by id; if last.id; run;
-Vijay
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.