BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
luvscandy27
Quartz | Level 8

Hello everyone, 

I have a dataset with millions of records. What I would like to do is keep the most recent date by id unless 

the other 4 rating variables are more complete. Any assistance would be appreciated .This is the sample dataset:

 

id

date

rate_1

rate_2

rate_3

rate_4

100

3/3/2023

1

1

1

1

100

2/27/2023

1

1

1

1

100

1/21/2023

1

1

1

1

100

12/28/2022

1

1

1

1

200

3/4/2023

.

.

.

.

200

1/21/2023

1

1

1

1

 

and this is what I would like to end up with :

id

date

rate_1

rate_2

rate_3

rate_4

100

3/3/2023

1

1

1

1

200

1/21/2023

1

1

1

1

 

The code for the sample dataset is: 

data test1;
infile datalines delimiter = ',';
input id $ date:mmddyy10. rate_1 rate_2 rate_3 rate_4 ;
format date mmddyy10.;
datalines;
100,03/03/2023,1,1,1,1,
100,02/27/2023,1,1,1,1,
100,01/21/2023,1,1,1,1,
100,12/28/2022,1,1,1,1,
200,03/04/2023,.,.,.,.,
200,01/21/2023,1,1,1,1
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

If it's not too big to sort, you could compute N and then sort the data by ID N Date, and grab the last record per ID.  Something like:

 

data test2 ;
  set test1 ;
  n=n(of rate: ) ;
run ;

proc sort data=test2 ;
  by id n date ;
run ;

data want ;
  set test2 ;
  by id n date ;
  if last.id ;
  drop n ;
run ;

View solution in original post

3 REPLIES 3
Quentin
Super User

If it's not too big to sort, you could compute N and then sort the data by ID N Date, and grab the last record per ID.  Something like:

 

data test2 ;
  set test1 ;
  n=n(of rate: ) ;
run ;

proc sort data=test2 ;
  by id n date ;
run ;

data want ;
  set test2 ;
  by id n date ;
  if last.id ;
  drop n ;
run ;
FreelanceReinh
Jade | Level 19

Hello @luvscandy27,

 

Alternatively, you can use a single PROC SQL step:

proc sql;
create table want(drop=c) as
select *, 1e6*n(rate_1,rate_2,rate_3,rate_4)+date as c from test1
group by id
having c=max(c);
quit;
vijaypratap0195
Obsidian | Level 7

I believe you can achieve this by using proc sort and last.id-

 

Try this-

proc sort data=test1; by id date; run;
 
data tmp;
set test1;
by id;
if sum(of rate_:) = 4;
run;
 
proc sort data=tmp; by id ; run;
  
data tmp1;
set tmp;
by id;
if last.id;
run;

-Vijay

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1084 views
  • 1 like
  • 4 in conversation