BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cadams47
Fluorite | Level 6

I have data that looks like this:

Obs    Date    CompanyID

1         7/15/15        1

2         12/31/01      2

3         12/31/02      2

4         3/31/05        3

5         5/15/05        3

 

What I need to do is keep the observations 1-4 and drop (or at least flag) observation 5. Observation 2 & 3 are not duplicates because there is a year in between those dates. Observations 4 and 5 will be treated as duplicates because they fall less than 60 days apart. 

Is the retain function appropriate?

1 ACCEPTED SOLUTION

Accepted Solutions
9 REPLIES 9
Kurt_Bremser
Super User

Use the lag() function:

proc sort data=have;
by date;
run;
/* just to be sure */

data want;
set have;
if date - lag(date) le 60 then delete;
run;
cadams47
Fluorite | Level 6

But does that take into account each company? A company might have one observation or it might have three. How can I make those decisions by company?

Kurt_Bremser
Super User

Just modify the sort and use by-processing in the data step:

proc sort data=have;
by company date;
run;

data want;
set have;
by company;
if date - lag(date) le 60 and not first.company then delete;
run;
cadams47
Fluorite | Level 6

I'm still hesitant because a company could have four observations, with two being close in date and two being 60+ days:

 

Date         CompanyID

5/15/05     4

6/30/05     4

6/30/09     4

6/30/10     4

 

In this scenario, if I relied on first.company, it would delete three of them. Only the first one would need to go.

Kurt_Bremser
Super User

It would keep three observations:

data have;
input date :mmddyy8. companyid;
format date mmddyy8.;
cards;
5/15/05 4
6/30/05 4
6/30/09 4
6/30/10 4
;
run;

data want;
set have;
by companyid;
if date - lag(date) le 60 and not first.companyid then delete;
run;

proc print noobs;
run;

Result:

    date    companyid

05/15/05        4    
06/30/09        4    
06/30/10        4    
mkeintz
PROC Star

 

 

This program outputs all records in which the date > cutdate.  And cutdate is set to 60 days after the last "valid" date.  Except for the first case of company, when cutdate is set to missing (which is lower that all valid numeric values):

 

data want (drop=cutdate);
  retain cutdate;
   set have;
   by companyid;
   if first.companyid then cutdate=.;
   if date>cutdate;
   cutdate=date+60;
run;

 

regards,

Mark

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
cadams47
Fluorite | Level 6

Thank you everyone!

cadams47
Fluorite | Level 6

I will once I figure out which one is better. Both left me with another issue.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 970 views
  • 2 likes
  • 3 in conversation