BookmarkSubscribeRSS Feed
danwarags
Obsidian | Level 7

Hello Everyone,

 

I have a sample dataset shown below. Each ID has different dates. I want to choose the observations within 15day period from the oldest date for each ID. For the observations with only one date per ID should appear in the output. For example in the below dataset I have the dates sorted by ID and date. 

 

DateID
1/2/20131
1/3/20131
1/25/20131
4/25/20142
2/6/20143
2/8/20143
3/8/20143
5/7/20134
4/4/20135
4/4/20135
6/5/20135
5/9/20146
8/16/20146

 

I need the output as shown below. 

 

1/2/20131
1/3/20131
4/25/20142
2/6/20143
2/8/20143
5/7/20134
4/4/20135
4/4/20135

 

In the above output dataset, For the first ID, the third observation is eliminated as the date difference between the oldest date and the third observation date was  above 15 days (i.e 23 day gap). For the sixth ID, the date difference was above 15days so both the observations got deleted. Second and Fourth ID were not changed as it had only one date per ID. 

 

Please guide me in coding this. Thank you in advance.

9 REPLIES 9
ballardw
Super User

First question: Are your dates actual SAS date values or character strings?

danwarags
Obsidian | Level 7
Thanks for quick reply ballardw. The dates are actual SAS date values.
ballardw
Super User

Neither 1/2/2013 nor 1/3/2013 are within 15 days of 1/25/2013. So is your rule actually within 15 days or something else?

Note that 2/6/2014 and 2/8/2014 are also not within 15 of 3/8/2014 and 4/4/2013 is not within 15 of 6/5/2013.

 

 

Also for Id 1 and 3 you do not include the maximum valued date but for Id 2 you do. Is the rule to do that if there is only a single value?

danwarags
Obsidian | Level 7
Hello Ballardw,

Thanks for responding. The dates are in DDMMYYYY format. So for the first ID, its Jan 2,2013 and Jan 3,2013.

My intention was to include all the IDs with single date values. So, I included ID 2 and ID 4 in the output.
ballardw
Super User

Not all of your dates are dd/mm/yyyy

Look at:

1/25/2013 1

4/25/2014 2

 

25 is not a valid month.

So the first step is going to make sure your data is what you think it is.

 

This may give a starting point. The two SQL steps could be combined but make sure they are working as intended. The first step gets the latest date for each ID. Then match to the dates within the 15 day window. Exercise for the interested reader to find the Ids with only one record and append to the result.

proc sql;
   create table work.start as
   select id, date 
   from have 
   group by id
   having date=Max(date);
quit;

proc sql;
   create table work.most as
   select b.id, b.date 
   from work.start as a left join have as b
        on a.id = b.id and abs(a.date-b.date) lt 15;
quit;

danwarags
Obsidian | Level 7

Sorry for the confusion Ballardw. That was my mistake. The date is actually in dd/mm/yyyy format.The first step should give the oldest date (first date) for each ID. I used min instead of max. The next sept should find the dates which are within the 15days from the oldest date. Unfortunately, the second code you suggested does not give me the actual results I wanted.

 

The output should satisty two conditions:

 

1. If there is only one date per each ID then keep in the output. (As in second and fourth ID)

2. If there are two or more dates per each ID the difference between the oldest date (i.e first date) and the other dates should not exceed 15 days. 

 

 

For clear understanding, the same dataset is formatted as below.

 

DateIDDate diffCondition
1/2/1310satisfied
1/3/1311satified
1/25/13123not satified
4/25/1420satisfied
2/6/1430satified
2/8/1432satified
3/8/1432satified
5/7/1340satified
4/4/1350satified
4/4/1350satified
6/5/13531not satified
5/9/1460satified
8/16/14697not satified

 

 

So, the dataset should output only the conditions which are satified. The output should look like below.

 

DateID
1/2/131
1/3/131
4/25/142
2/6/143
2/8/143
3/8/143
5/7/134
4/4/135
4/4/135
5/9/146

 

Thank you

 

Ksharp
Super User
data have;
infile cards expandtabs truncover;
input Date : mmddyy10. ID;
format Date mmddyy10.;
cards;
1/2/2013	1
1/3/2013	1
1/25/2013	1
4/25/2014	2
2/6/2014	3
2/8/2014	3
3/8/2014	3
5/7/2013	4
4/4/2013	5
4/4/2013	5
6/5/2013	5
5/9/2014	6
8/16/2014	6
;
run;
data temp;
 set have;
 by id;
 retain temp;
 if first.id then do; temp=date;group=0;end;
 dif=date-temp;
 if dif gt 15 then group+1;
 drop temp dif;
run;
data want;
 set temp;
 by id group;
 if (first.id and last.id) or (not (first.group and last.group));
run;
danwarags
Obsidian | Level 7

Hello Keshan,

Thanks for the solution. The code is working for the dataset I provided. But when tried with a bigger dataset this is not working. I think if we create a separate variable which shows the difference between the dates of first.id and last.id would work.

 

The rule is
Difference between the first observation of first ID and the other observations of the first ID should not exceed 15 days.

 

 

Ksharp
Super User

Yes. I have already calculated that difference - DIF .

 

drop temp dif;  ==>
drop temp;

 Post the data that did not work for you and the output you want to see.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1062 views
  • 0 likes
  • 3 in conversation