Hi everyone,
I have the following data sets:
Dataset A:
ID | DATE |
---|---|
123 | 31JAN2014 |
123 | 31JUL2014 |
123 | 31AUG2014 |
222 | 31MAR2014 |
222 | 30APR2014 |
333 | 30JUN2014 |
421 | 28FEB2014 |
235 | 31MAR2014 |
991 | 31MAY2014 |
991 | 31OCT2014 |
991 | 31DEC2014 |
And Dataset B:
ID | DATE | MEMBERSHIP |
---|---|---|
123 | 28FEB2014 | PURPLE |
123 | 30APR2014 | BRONZE |
123 | 31JUL2014 | SILVER |
123 | 31DEC2014 | GOLD |
222 | 28FEB2014 | PURPLE |
222 | 31MAR2014 | BRONZE |
222 | 31JUN2014 | SILVER |
235 | 28FEB2014 | SILVER |
235 | 31MAY2014 | GOLD |
The result I want:
Dataset Want:
ID | DATE | DATE2 | MEMBERSHIP |
---|---|---|---|
123 | 31JAN2014 | 28FEB2014 | PURPLE |
123 | 31JUL2014 | 31JUL2014 | SILVER |
123 | 31AUG2014 | 31JUL2014 | SILVER |
222 | 31MAR2014 | 31MAR2014 | BRONZE |
222 | 30APR2014 | 31MAR2014 | BRONZE |
333 | 30JUN2014 | . | |
421 | 28FEB2014 | . | |
235 | 31MAR2014 | 28FEB2014 | GOLD |
991 | 31MAY2014 | . | |
991 | 31OCT2014 | . | |
911 | 31DEC2014 | . |
Essentially I want to check a customers status as close as possible to dates from table A.
I have done the following code:
proc sql;
create table finaldsn as
select distinct a.ID,
a.DATE,
b.DATE2,
b.MEMBERSHIP
, a.DATE-b.DATE2 as days
, abs(calculated days) as absdays
, min(calculated absdays)as close
from DatasetA as a
full join
DatasetB as b
on a.ID=b.ID
where a.DATE ne . and b.DATE2 ne .
and b.ID in (select distinct ID
from DatasetA)
group by a.ID, a.Date
having calculated absdays=calculated close
order by ID, Date, close
;
quit;
data want;
set finaldsn;
by ID Date;
if first.ID;
run;
But this only gives the first entry of each unique ID regardless of the date associated with it.
Anyone got ideas how to work around it ?
Thank you .
data a; input id date :date9.; format date date9.; cards; 123 31JAN2014 123 31JUL2014 123 31AUG2014 222 31MAR2014 222 30APR2014 333 30JUN2014 421 28FEB2014 235 31MAR2014 991 31MAY2014 991 31OCT2014 991 31DEC2014 ; run; data b; input id date : date9. membership $; format date date9.; cards; 123 28FEB2014 PURPLE 123 30APR2014 BRONZE 123 31JUL2014 SILVER 123 31DEC2014 GOLD 222 28FEB2014 PURPLE 222 31MAR2014 BRONZE 222 30JUN2014 SILVER 235 28FEB2014 SILVER 235 31MAY2014 GOLD ; run; proc sql; create table want(drop=dif) as select a.*,b.date as date2,b.membership,abs(a.date-b.date) as dif from a left join b on a.id=b.id group by a.id,a.date having calculated dif=min(calculated dif); quit;
Xia Keshan
One solution I'm now working on is to make a new a unique field to identify the observation:
-> NewID = ID + DATE (for example 12331JAN2014)
-> And then apply the first. statement to the new identifier, NewID.
Although I don't think this is a very efficient way regarding the problem, especially if you are working with a large data set ?
I'd try to start with expanding the first dataset:
%let offset_max=100;* or whatever suits your needs;
data int1;
set a;
do offset = 1 to &offset_max;
target_date = date - offset;
output;
target_date = date + offset;
output;
end;
run;
Now sort that dataset by id, target_date and offset
If you now merge on id and date = target_date with dataset B and keep the smallest offset, you should get your intended result.
I created dataa and datab datasets changing 31JUN2014 to 30JUN2014. The only problem I had was with ID 235. Your 'want' says 235 should be gold. My results say it should be SILVER. I arbitrarily gave days the value of 1000000 when it would calculate as missing.
proc sql;
drop table testit;
create table testit as
select distinct a. ID,
a.date format=date9.,
case
when b.id = ' ' then .
else b.date2 end as date2 format=date9.,
case
when b.id = ' ' then ' '
else b.membership end as MEMBERSHIP,
case
when b.id = ' ' then 1000000
else a.DATE-b.DATE2 end as days
, abs(calculated days) as absdays
, min(calculated absdays)as close
from DataA as a full join
DataB as b
on a.ID=b.ID
group by a.ID, a.date
order by id, close, date, absdays, date2
;
quit;
proc sort data=testit;
by id date close absdays;
run;
data testit3;
set testit;
by id date close absdays;
if first.date then output;
else delete;
keep id date date2 membership;
run;
proc print data=testit3;run;
ID DATE date2 MEMBERSHIP days absdays close
235 31MAR2014 28FEB2014 SILVER 31 31 31
235 31MAR2014 31MAY2014 GOLD -61 61 31
Obs ID DATE date2 MEMBERSHIP
1 123 31JAN2014 28FEB2014 PURPLE
2 123 31JUL2014 31JUL2014 SILVER
3 123 31AUG2014 31JUL2014 SILVER
4 222 31MAR2014 31MAR2014 BRONZE
5 222 30APR2014 31MAR2014 BRONZE
6 235 31MAR2014 28FEB2014 SILVER
7 333 30JUN2014 .
8 421 28FEB2014 .
9 991 31MAY2014 .
10 991 31OCT2014 .
11 991 31DEC2014 .
data a; input id date :date9.; format date date9.; cards; 123 31JAN2014 123 31JUL2014 123 31AUG2014 222 31MAR2014 222 30APR2014 333 30JUN2014 421 28FEB2014 235 31MAR2014 991 31MAY2014 991 31OCT2014 991 31DEC2014 ; run; data b; input id date : date9. membership $; format date date9.; cards; 123 28FEB2014 PURPLE 123 30APR2014 BRONZE 123 31JUL2014 SILVER 123 31DEC2014 GOLD 222 28FEB2014 PURPLE 222 31MAR2014 BRONZE 222 30JUN2014 SILVER 235 28FEB2014 SILVER 235 31MAY2014 GOLD ; run; proc sql; create table want(drop=dif) as select a.*,b.date as date2,b.membership,abs(a.date-b.date) as dif from a left join b on a.id=b.id group by a.id,a.date having calculated dif=min(calculated dif); quit;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.