BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rakeshvvv
Quartz | Level 8

Hi Friends,

 

Can someone help me with below query. I have two datasets X and Y.

 

X has two variables ID and date

Y has two variables ID and date

 

ID is the key variable and can have different number of records in X and Y for the same ID.

 

For every ID and DATE in X, Date should be compared  with Y and should output only those where X.DATE=+_3 days.Y.date.

 

I Hope i explained and thanks in advance.

Sample data

x:

 

140011001 2011-09-09 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-20 
140011001 2011-09-20 
140011001 2011-09-20 

 

Y:

140011001 2011-09-09
140011001 2011-09-29

 

OUTPUT:

 

140011001 2011-09-20 
140011001 2011-09-20 
140011001 2011-09-20 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data x;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-20 
140011001 2011-09-20 
140011001 2011-09-20 
;
run;
data y;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09
140011001 2011-09-29
;
run;
proc sql;
create table want as
 select x.*
  from x,y
   where x.id eq y.id and x.date between y.date-3 and y.date+3;
quit;


View solution in original post

6 REPLIES 6
Reeza
Super User

It doesn't look like your data matches your rule, 2011-09-20 +-3 days is 2011-09-17 and 2011-09-23 and those dates aren't in the second table. Please clarify your requirements.

ballardw
Super User

Are your date variables SAS Date value or character? The comparisons will work much better with SAS data values.

 

Also your requirement seems to say that the X date is 3 days from the Y date but your example out put shows the X date 2011-09-20 to be 9 days from the displayed Y date of 2011-09-20. You may need to expand on the detales of the rule for us.

rakeshvvv
Quartz | Level 8

The data is correct.....date is a character variable in yyyymmdd format.

 

140011001 2011-09-09 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-20 
140011001 2011-09-20 
140011001 2011-09-20 

 

Y:

140011001 2011-09-09
140011001 2011-09-29

 

OUTPUT:

 

140011001 2011-09-09 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 

 

Reason for output: As the first 6 records in X(2011-09-09  2011-09-11 ) within 3 days of Y(2011-09-09). We should output those records.

 

140011001 2011-09-20= it is beyond 3 days from Y and hence we are not outputting those records 

Ksharp
Super User
data x;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-20 
140011001 2011-09-20 
140011001 2011-09-20 
;
run;
data y;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09
140011001 2011-09-29
;
run;
proc sql;
create table want as
 select x.*
  from x,y
   where x.id eq y.id and x.date between y.date-3 and y.date+3;
quit;


JohndeKroon
Obsidian | Level 7

Wouldn't it be better let the join take care of merging the two tables, and let the where-statement take care of the proper selection?

 

proc sql;

create table want as

select x.*

from x as a

left join y as b on (a.id = b.id)

where abs(a.date - b.date) <= 3;

quit;

 

Personally I prefer using abs(x-y)<=3 - makes it more explicit where the boundaries of the selection are...

FreelanceReinh
Jade | Level 19

Either way, you would need to consider the case that observations from X would be selected multiple times by PROC SQL if there were two or more "matching" Y records. For example, if Y contained a third observation with ID='140011001' and date='2011-09-10', the current result dataset would double from 6 to 12 observations. Of course, this could be avoided by selecting DISTINCT ... including an additional variable, e.g. a record number, which uniquely identifies observations in X.


Also, please note that with your character dates, the BETWEEN condition would read
input(x.date,yymmdd10.) between input(y.date,yymmdd10.)-3 and input(y.date,yymmdd10.)+3.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1167 views
  • 2 likes
  • 6 in conversation