Help using Base SAS procedures

Dates and Merging.

Reply
Occasional Contributor
Posts: 16

Dates and Merging.

Hello,

I've got two data sets.

I'm trying to get these...

id,StartDate, Value,
001,01JUL2010, 6050
002,27MAY2010, 9050
003, 11DEC2010, 16500

into...

id, StartDate, EndDate, OtherValue
001, 01SEP2009,17DEC2009, 7500
001, 18DEC2009,29APR2010, 7000
001, 30APR2010,31DEC9999, 6000
002, 10MAY2010,16AUG2010, 11000
002, 17AUG2010,26NOV2010,10500
002, 27NOV2010,11APR2011,9500
002, 12APR2011,31DEC9999,9000
003, 01DEC2010,12JAN2011,20000
003, 13JAN2011, 25MAY2011,19500
003, 26MAY2011, 31DEC9999, 18990

and return the other value that would have been during that time...

id, StartDate, Value, OtherValue
001,01JUL2010, 6050, 6000
002,27MAY2010, 9050, 11000
003,11DEC2010, 16500, 20000

So I have a variable to merge on but the startdate variable in the first set needs to read in between the start and end date in the second one and then output the other value variable.

I'm not very experienced with any merging outside of indexing...

data name (index=());
set sample;
set sample2 key=/unique;
if _error_ then delete (or output...);
run;

Thanks in advance.
Super User
Posts: 10,023

Re: Dates and Merging.

Or If you like, Hash Table also do it with more efficient.


[pre]



data temp1;
infile datalines dlm=' ,';
input id $ StartDate : date9. Value;
format StartDate date9.;
datalines;
001,01JUL2010, 6050
002,27MAY2010, 9050
003, 11DEC2010, 16500
;
run;
data temp2;
infile datalines dlm=' ,';
input id $ StartDate : date9. EndDate : date9. OtherValue;
datalines;
001, 01SEP2009,17DEC2009, 7500
001, 18DEC2009,29APR2010, 7000
001, 30APR2010,31DEC9999, 6000
002, 10MAY2010,16AUG2010, 11000
002, 17AUG2010,26NOV2010,10500
002, 27NOV2010,11APR2011,9500
002, 12APR2011,31DEC9999,9000
003, 01DEC2010,12JAN2011,20000
003, 13JAN2011, 25MAY2011,19500
003, 26MAY2011, 31DEC9999, 18990
;
run;
data want(drop=_StartDate EndDate);
merge temp1 temp2(rename=(StartDate=_StartDate));
by id;
if StartDate ge _StartDate and StartDate le EndDate ;
run;
[/pre]




Ksharp
Occasional Contributor
Posts: 16

Re: Dates and Merging.

Hi KSharp,

Ran you practise script and it worked a treat, applied it to what I was working on and it produced some kind of monster.

Reason I suspect i'm not obtaining the desired results would be that the ID number in the first set can appear multiple times, with a unique StartDate.

What other methods are there of merging?
Super Contributor
Super Contributor
Posts: 3,174

Re: Dates and Merging.

Have a look at BY GROUP PROCESSING in the SAS.COM support site.

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search argument, this topic / post:

data step by group processing site:sas.com
Frequent Contributor
Frequent Contributor
Posts: 94

Re: Dates and Merging.

I think KSharp's method requires that the ID in table temp1 is unique, otherwise a merge would start producing weird results.

I have recently had a similar issue in joining up large time series data. I found the notes on Fuzzy Merges in http://www2.sas.com/proceedings/sugi24/Advtutor/p46-24.pdf to be quite useful in working out a solution.

Assuming that your "lookup" table - Temp2 in KSharp's example code - has no overlapping time ranges, then this should work. I used this method over something like an SQL join because both of my datasets are huge (about 5 million in Temp1 and about 100 million in Temp2).

By using an interleave, I reckon you're only really doing the equivalent of processing both data sets in full, with a single parse.


proc sort data=temp1;
by id startdate;
run;

proc sort data=temp2;
by id startdate;
run;

data want (drop=EndDate OtherValue);
set
temp2 (in=T2)
temp1 (in=T1);
by id startdate;

format Last_StartDate date9.;
format Last_EndDate date9.;
format Last_OtherValue best12.0;

retain Last_StartDate;
retain Last_EndDate;
retain Last_OtherValue;

/*if lookup dataset, then copy the values*/
if t2 then do;
Last_StartDate = StartDate;
Last_EndDate = EndDate;
Last_OtherValue = OtherValue;
end;
/*if the first instance of ID is in table 1, or T2's range has been past, then clear the values*/
if (t1 and (first.ID or Last_EndDate < StartDate)) then do;
Last_StartDate = .;
Last_EndDate = .;
Last_OtherValue = .;
end;

if t1 then output;

run;
Super User
Posts: 10,023

Re: Dates and Merging.

Yes. I suppose there is unique id in temp1.
If you have duplicate ids ,then sql 's Cartesian Product is good choice.
[pre]




data temp1;
infile datalines dlm=' ,';
input id $ StartDate : date9. Value;
format StartDate date9.;
datalines;
001,01JUL2010, 6050
001,01OCT2009, 6050
002,27MAY2010, 9050
003, 11DEC2010, 16500
;
run;
data temp2;
infile datalines dlm=' ,';
input id $ StartDate : date9. EndDate : date9. OtherValue;
datalines;
001, 01SEP2009,17DEC2009, 7500
001, 18DEC2009,29APR2010, 7000
001, 30APR2010,31DEC2010, 6000
002, 10MAY2010,16AUG2010, 11000
002, 17AUG2010,26NOV2010,10500
002, 27NOV2010,11APR2011,9500
002, 12APR2011,31DEC2010,9000
003, 01DEC2010,12JAN2011,20000
003, 13JAN2011, 25MAY2011,19500
003, 26MAY2011, 31DEC9999, 18990
;
run;

proc sql;
create table want as
select a.id as id,b.startdate as startdate,value,othervalue
from temp2 as a,temp1 as b
where a.id = b.id and b.startdate between a.startdate and a.enddate
;
quit;
[/pre]


Ksharp
Ask a Question
Discussion stats
  • 5 replies
  • 174 views
  • 0 likes
  • 4 in conversation