Help using Base SAS procedures

SAS QUERY

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 145
Accepted Solution

SAS QUERY

Hi Friends,

 

Can someone help me with below query. I have two datasets X and Y.

 

X has two variables ID and date

Y has two variables ID and date

 

ID is the key variable and can have different number of records in X and Y for the same ID.

 

For every ID and DATE in X, Date should be compared  with Y and should output only those where X.DATE=+_3 days.Y.date.

 

I Hope i explained and thanks in advance.

Sample data

x:

 

140011001 2011-09-09 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-20 
140011001 2011-09-20 
140011001 2011-09-20 

 

Y:

140011001 2011-09-09
140011001 2011-09-29

 

OUTPUT:

 

140011001 2011-09-20 
140011001 2011-09-20 
140011001 2011-09-20 

 


Accepted Solutions
Solution
‎01-12-2016 10:42 AM
Super User
Posts: 9,671

Re: SAS QUERY

data x;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-20 
140011001 2011-09-20 
140011001 2011-09-20 
;
run;
data y;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09
140011001 2011-09-29
;
run;
proc sql;
create table want as
 select x.*
  from x,y
   where x.id eq y.id and x.date between y.date-3 and y.date+3;
quit;


View solution in original post


All Replies
Super User
Posts: 17,784

Re: SAS QUERY

It doesn't look like your data matches your rule, 2011-09-20 +-3 days is 2011-09-17 and 2011-09-23 and those dates aren't in the second table. Please clarify your requirements.

Super User
Posts: 10,483

Re: SAS QUERY

Are your date variables SAS Date value or character? The comparisons will work much better with SAS data values.

 

Also your requirement seems to say that the X date is 3 days from the Y date but your example out put shows the X date 2011-09-20 to be 9 days from the displayed Y date of 2011-09-20. You may need to expand on the detales of the rule for us.

Frequent Contributor
Posts: 145

Re: SAS QUERY

The data is correct.....date is a character variable in yyyymmdd format.

 

140011001 2011-09-09 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-20 
140011001 2011-09-20 
140011001 2011-09-20 

 

Y:

140011001 2011-09-09
140011001 2011-09-29

 

OUTPUT:

 

140011001 2011-09-09 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 

 

Reason for output: As the first 6 records in X(2011-09-09  2011-09-11 ) within 3 days of Y(2011-09-09). We should output those records.

 

140011001 2011-09-20= it is beyond 3 days from Y and hence we are not outputting those records 

Solution
‎01-12-2016 10:42 AM
Super User
Posts: 9,671

Re: SAS QUERY

data x;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-11 
140011001 2011-09-20 
140011001 2011-09-20 
140011001 2011-09-20 
;
run;
data y;
input id : $20. date yymmdd12.;
format date yymmdd.;
cards;
140011001 2011-09-09
140011001 2011-09-29
;
run;
proc sql;
create table want as
 select x.*
  from x,y
   where x.id eq y.id and x.date between y.date-3 and y.date+3;
quit;


Occasional Contributor
Posts: 10

Re: SAS QUERY

Wouldn't it be better let the join take care of merging the two tables, and let the where-statement take care of the proper selection?

 

proc sql;

create table want as

select x.*

from x as a

left join y as b on (a.id = b.id)

where abs(a.date - b.date) <= 3;

quit;

 

Personally I prefer using abs(x-y)<=3 - makes it more explicit where the boundaries of the selection are...

Trusted Advisor
Posts: 1,115

Re: SAS QUERY

Either way, you would need to consider the case that observations from X would be selected multiple times by PROC SQL if there were two or more "matching" Y records. For example, if Y contained a third observation with ID='140011001' and date='2011-09-10', the current result dataset would double from 6 to 12 observations. Of course, this could be avoided by selecting DISTINCT ... including an additional variable, e.g. a record number, which uniquely identifies observations in X.


Also, please note that with your character dates, the BETWEEN condition would read
input(x.date,yymmdd10.) between input(y.date,yymmdd10.)-3 and input(y.date,yymmdd10.)+3.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 413 views
  • 2 likes
  • 6 in conversation