Hi everyone,
I have the following data sets:
Dataset A:
ID | DATE |
---|---|
123 | 31JAN2014 |
123 | 31JUL2014 |
123 | 31AUG2014 |
222 | 31MAR2014 |
222 | 30APR2014 |
333 | 30JUN2014 |
421 | 28FEB2014 |
235 | 31MAR2014 |
991 | 31MAY2014 |
991 | 31OCT2014 |
991 | 31DEC2014 |
And Dataset B:
ID | DATE | MEMBERSHIP |
---|---|---|
123 | 28FEB2014 | PURPLE |
123 | 30APR2014 | BRONZE |
123 | 31JUL2014 | SILVER |
123 | 31DEC2014 | GOLD |
222 | 28FEB2014 | PURPLE |
222 | 31MAR2014 | BRONZE |
222 | 31JUN2014 | SILVER |
235 | 28FEB2014 | SILVER |
235 | 31MAY2014 | GOLD |
The result I want:
Dataset Want:
ID | DATE | DATE2 | MEMBERSHIP |
---|---|---|---|
123 | 31JAN2014 | 28FEB2014 | PURPLE |
123 | 31JUL2014 | 31JUL2014 | SILVER |
123 | 31AUG2014 | 31JUL2014 | SILVER |
222 | 31MAR2014 | 31MAR2014 | BRONZE |
222 | 30APR2014 | 31MAR2014 | BRONZE |
333 | 30JUN2014 | . | |
421 | 28FEB2014 | . | |
235 | 31MAR2014 | 28FEB2014 | GOLD |
991 | 31MAY2014 | . | |
991 | 31OCT2014 | . | |
911 | 31DEC2014 | . |
Essentially I want to check a customers status as close as possible to dates from table A.
I have done the following code:
proc sql;
create table finaldsn as
select distinct a.ID,
a.DATE,
b.DATE2,
b.MEMBERSHIP
, a.DATE-b.DATE2 as days
, abs(calculated days) as absdays
, min(calculated absdays)as close
from DatasetA as a
full join
DatasetB as b
on a.ID=b.ID
where a.DATE ne . and b.DATE2 ne .
and b.ID in (select distinct ID
from DatasetA)
group by a.ID, a.Date
having calculated absdays=calculated close
order by ID, Date, close
;
quit;
data want;
set finaldsn;
by ID Date;
if first.ID;
run;
But this only gives the first entry of each unique ID regardless of the date associated with it.
Anyone got ideas how to work around it ?
Thank you .
data a; input id date :date9.; format date date9.; cards; 123 31JAN2014 123 31JUL2014 123 31AUG2014 222 31MAR2014 222 30APR2014 333 30JUN2014 421 28FEB2014 235 31MAR2014 991 31MAY2014 991 31OCT2014 991 31DEC2014 ; run; data b; input id date : date9. membership $; format date date9.; cards; 123 28FEB2014 PURPLE 123 30APR2014 BRONZE 123 31JUL2014 SILVER 123 31DEC2014 GOLD 222 28FEB2014 PURPLE 222 31MAR2014 BRONZE 222 30JUN2014 SILVER 235 28FEB2014 SILVER 235 31MAY2014 GOLD ; run; proc sql; create table want(drop=dif) as select a.*,b.date as date2,b.membership,abs(a.date-b.date) as dif from a left join b on a.id=b.id group by a.id,a.date having calculated dif=min(calculated dif); quit;
Xia Keshan
One solution I'm now working on is to make a new a unique field to identify the observation:
-> NewID = ID + DATE (for example 12331JAN2014)
-> And then apply the first. statement to the new identifier, NewID.
Although I don't think this is a very efficient way regarding the problem, especially if you are working with a large data set ?
I'd try to start with expanding the first dataset:
%let offset_max=100;* or whatever suits your needs;
data int1;
set a;
do offset = 1 to &offset_max;
target_date = date - offset;
output;
target_date = date + offset;
output;
end;
run;
Now sort that dataset by id, target_date and offset
If you now merge on id and date = target_date with dataset B and keep the smallest offset, you should get your intended result.
I created dataa and datab datasets changing 31JUN2014 to 30JUN2014. The only problem I had was with ID 235. Your 'want' says 235 should be gold. My results say it should be SILVER. I arbitrarily gave days the value of 1000000 when it would calculate as missing.
proc sql;
drop table testit;
create table testit as
select distinct a. ID,
a.date format=date9.,
case
when b.id = ' ' then .
else b.date2 end as date2 format=date9.,
case
when b.id = ' ' then ' '
else b.membership end as MEMBERSHIP,
case
when b.id = ' ' then 1000000
else a.DATE-b.DATE2 end as days
, abs(calculated days) as absdays
, min(calculated absdays)as close
from DataA as a full join
DataB as b
on a.ID=b.ID
group by a.ID, a.date
order by id, close, date, absdays, date2
;
quit;
proc sort data=testit;
by id date close absdays;
run;
data testit3;
set testit;
by id date close absdays;
if first.date then output;
else delete;
keep id date date2 membership;
run;
proc print data=testit3;run;
ID DATE date2 MEMBERSHIP days absdays close
235 31MAR2014 28FEB2014 SILVER 31 31 31
235 31MAR2014 31MAY2014 GOLD -61 61 31
Obs ID DATE date2 MEMBERSHIP
1 123 31JAN2014 28FEB2014 PURPLE
2 123 31JUL2014 31JUL2014 SILVER
3 123 31AUG2014 31JUL2014 SILVER
4 222 31MAR2014 31MAR2014 BRONZE
5 222 30APR2014 31MAR2014 BRONZE
6 235 31MAR2014 28FEB2014 SILVER
7 333 30JUN2014 .
8 421 28FEB2014 .
9 991 31MAY2014 .
10 991 31OCT2014 .
11 991 31DEC2014 .
data a; input id date :date9.; format date date9.; cards; 123 31JAN2014 123 31JUL2014 123 31AUG2014 222 31MAR2014 222 30APR2014 333 30JUN2014 421 28FEB2014 235 31MAR2014 991 31MAY2014 991 31OCT2014 991 31DEC2014 ; run; data b; input id date : date9. membership $; format date date9.; cards; 123 28FEB2014 PURPLE 123 30APR2014 BRONZE 123 31JUL2014 SILVER 123 31DEC2014 GOLD 222 28FEB2014 PURPLE 222 31MAR2014 BRONZE 222 30JUN2014 SILVER 235 28FEB2014 SILVER 235 31MAY2014 GOLD ; run; proc sql; create table want(drop=dif) as select a.*,b.date as date2,b.membership,abs(a.date-b.date) as dif from a left join b on a.id=b.id group by a.id,a.date having calculated dif=min(calculated dif); quit;
Xia Keshan
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.