Hello all!,
I am attempting to merge two datasets together. Dataset 1 has two variables that indicate the beginning and end of a date period, which are roughly one month intervals (interval_begin and interval_end). Dataset 2 has dates (event_date) that I want to merge in but that line up wit the date interval (there are also other variables in the row that correspond to that event date that I will need in the row of data). I've tried to illustrate an example below of what I want the data to look like (id is in both datasets). In the example below, rows, 2 and 3 have the same intervals but different event dates. My ultimate goal is to be able to sum the number of unique event dates within that monthly interval (so 2 would be the sum for interval 2/3/2008-3/2/2008). The sum would be zero for interval 1/1/2008-2/2/2008 and would be 1 for the interval 3/3/2008-4/3/2008.
id interval_begin interval_end event_date
1 1/1/2008 2/2/2008 .
1 2/3/2008 3/2/2008 2/15/2008
1 2/3/2008 3/2/2008 2/17/2008
1 3/3/2008 4/3/2008 3/21/2008
1 4/4/2008 5/1/2008 .
2 11/4/2010 12/5/2010 .
2 12/6/2010 1/6/2011 .
2 1/7/2011 2/8/2011 2/4/2011
I have tried doing a simple merge by id but this is what I get.
id interval_begin interval_end event_date
1 1/1/2008 2/2/2008 2/15/2008
1 2/3/2008 3/2/2008 2/17/2008
1 3/3/2008 4/3/2008 3/21/2008
1 4/4/2008 5/1/2008 3/21/2008
2 11/4/2010 12/5/2010 2/4/2011
2 12/6/2010 1/6/2011 2/4/2011
2 1/7/2011 2/8/2011 2/4/2011
What is the best way to get my data into my desired structure so that I can produce the sums that I need? I'm going to have to merge with a 3rd dataset with another date variable but figured if I could get this to work with datasets 1 and 2, I can do this again with the third set. Any help would be most appreciated!
proc sql;
select a.*,b.*
from ds1 a
left join
ds2 b
on b.event_date between a.interval_begin and b.interval_end
and a.id = b.id;
quit;
Many to many merge on a SAS data step does not give the desired results.
Can you add a samples of dataset 1 and a sample of dataset 2 and then the desired output.
Absolutely!
Dataset 1 looks like this:
id interval_begin interval_end var1 var2 var3 var4....... and so forth
1 1/1/2008 2/2/2008
1 2/3/2008 3/2/2008
1 3/3/2008 4/3/2008
1 4/4/2008 5/1/2008
2 11/4/2010 12/5/2010
2 12/6/2010 1/6/2011
2 1/7/2011 2/8/2011
Dataset 2 looks like this:
id event_date var_a var_b var_c...... and so forth
1 2/15/2008
1 2/17/2008
1 3/21/2008
2 2/4/2011
Below is the desired output:
id interval_begin interval_end event_date var1 var2 var3 var_a var_b var_c ...etc.
1 1/1/2008 2/2/2008 .
1 2/3/2008 3/2/2008 2/15/2008
1 2/3/2008 3/2/2008 2/17/2008
1 3/3/2008 4/3/2008 3/21/2008
1 4/4/2008 5/1/2008 .
2 11/4/2010 12/5/2010 .
2 12/6/2010 1/6/2011 .
2 1/7/2011 2/8/2011 2/4/2011
proc sql;
select a.*,b.*
from ds1 a
left join
ds2 b
on b.event_date between a.interval_begin and b.interval_end
and a.id = b.id;
quit;
This worked beautifully, thank you!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.