BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
luvscandy27
Quartz | Level 8

Hello everyone, 

I have a dataset with millions of records. What I would like to do is keep the most recent date by id unless 

the other 4 rating variables are more complete. Any assistance would be appreciated .This is the sample dataset:

 

id

date

rate_1

rate_2

rate_3

rate_4

100

3/3/2023

1

1

1

1

100

2/27/2023

1

1

1

1

100

1/21/2023

1

1

1

1

100

12/28/2022

1

1

1

1

200

3/4/2023

.

.

.

.

200

1/21/2023

1

1

1

1

 

and this is what I would like to end up with :

id

date

rate_1

rate_2

rate_3

rate_4

100

3/3/2023

1

1

1

1

200

1/21/2023

1

1

1

1

 

The code for the sample dataset is: 

data test1;
infile datalines delimiter = ',';
input id $ date:mmddyy10. rate_1 rate_2 rate_3 rate_4 ;
format date mmddyy10.;
datalines;
100,03/03/2023,1,1,1,1,
100,02/27/2023,1,1,1,1,
100,01/21/2023,1,1,1,1,
100,12/28/2022,1,1,1,1,
200,03/04/2023,.,.,.,.,
200,01/21/2023,1,1,1,1
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

If it's not too big to sort, you could compute N and then sort the data by ID N Date, and grab the last record per ID.  Something like:

 

data test2 ;
  set test1 ;
  n=n(of rate: ) ;
run ;

proc sort data=test2 ;
  by id n date ;
run ;

data want ;
  set test2 ;
  by id n date ;
  if last.id ;
  drop n ;
run ;
The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.

View solution in original post

3 REPLIES 3
Quentin
Super User

If it's not too big to sort, you could compute N and then sort the data by ID N Date, and grab the last record per ID.  Something like:

 

data test2 ;
  set test1 ;
  n=n(of rate: ) ;
run ;

proc sort data=test2 ;
  by id n date ;
run ;

data want ;
  set test2 ;
  by id n date ;
  if last.id ;
  drop n ;
run ;
The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
FreelanceReinh
Jade | Level 19

Hello @luvscandy27,

 

Alternatively, you can use a single PROC SQL step:

proc sql;
create table want(drop=c) as
select *, 1e6*n(rate_1,rate_2,rate_3,rate_4)+date as c from test1
group by id
having c=max(c);
quit;
vijaypratap0195
Obsidian | Level 7

I believe you can achieve this by using proc sort and last.id-

 

Try this-

proc sort data=test1; by id date; run;
 
data tmp;
set test1;
by id;
if sum(of rate_:) = 4;
run;
 
proc sort data=tmp; by id ; run;
  
data tmp1;
set tmp;
by id;
if last.id;
run;

-Vijay

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 446 views
  • 1 like
  • 4 in conversation