Hi Friends,
Can someone help me with below query. I have two datasets X and Y.
X has two variables ID and date
Y has two variables ID and date
ID is the key variable and can have different number of records in X and Y for the same ID.
For every ID and DATE in X, Date should be compared with Y and should output only those where X.DATE=+_3 days.Y.date.
I Hope i explained and thanks in advance.
Sample data
x:
140011001 2011-09-09
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-20
140011001 2011-09-20
140011001 2011-09-20
Y:
140011001 2011-09-09
140011001 2011-09-29
OUTPUT:
140011001 2011-09-20
140011001 2011-09-20
140011001 2011-09-20
data x;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-20
140011001 2011-09-20
140011001 2011-09-20
;
run;
data y;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09
140011001 2011-09-29
;
run;
proc sql;
create table want as
select x.*
from x,y
where x.id eq y.id and x.date between y.date-3 and y.date+3;
quit;
It doesn't look like your data matches your rule, 2011-09-20 +-3 days is 2011-09-17 and 2011-09-23 and those dates aren't in the second table. Please clarify your requirements.
Are your date variables SAS Date value or character? The comparisons will work much better with SAS data values.
Also your requirement seems to say that the X date is 3 days from the Y date but your example out put shows the X date 2011-09-20 to be 9 days from the displayed Y date of 2011-09-20. You may need to expand on the detales of the rule for us.
The data is correct.....date is a character variable in yyyymmdd format.
140011001 2011-09-09
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-20
140011001 2011-09-20
140011001 2011-09-20
Y:
140011001 2011-09-09
140011001 2011-09-29
OUTPUT:
140011001 2011-09-09
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
Reason for output: As the first 6 records in X(2011-09-09 2011-09-11 ) within 3 days of Y(2011-09-09). We should output those records.
140011001 2011-09-20= it is beyond 3 days from Y and hence we are not outputting those records
data x;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-11
140011001 2011-09-20
140011001 2011-09-20
140011001 2011-09-20
;
run;
data y;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09
140011001 2011-09-29
;
run;
proc sql;
create table want as
select x.*
from x,y
where x.id eq y.id and x.date between y.date-3 and y.date+3;
quit;
Wouldn't it be better let the join take care of merging the two tables, and let the where-statement take care of the proper selection?
proc sql;
create table want as
select x.*
from x as a
left join y as b on (a.id = b.id)
where abs(a.date - b.date) <= 3;
quit;
Personally I prefer using abs(x-y)<=3 - makes it more explicit where the boundaries of the selection are...
Either way, you would need to consider the case that observations from X would be selected multiple times by PROC SQL if there were two or more "matching" Y records. For example, if Y contained a third observation with ID='140011001' and date='2011-09-10', the current result dataset would double from 6 to 12 observations. Of course, this could be avoided by selecting DISTINCT ... including an additional variable, e.g. a record number, which uniquely identifies observations in X.
Also, please note that with your character dates, the BETWEEN condition would read
input(x.date,yymmdd10.) between input(y.date,yymmdd10.)-3 and input(y.date,yymmdd10.)+3.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.