Hello all!,
I am attempting to merge two datasets together. Dataset 1 has two variables that indicate the beginning and end of a date period, which are roughly one month intervals (interval_begin and interval_end). Dataset 2 has dates (event_date) that I want to merge in but that line up wit the date interval (there are also other variables in the row that correspond to that event date that I will need in the row of data). I've tried to illustrate an example below of what I want the data to look like (id is in both datasets). In the example below, rows, 2 and 3 have the same intervals but different event dates. My ultimate goal is to be able to sum the number of unique event dates within that monthly interval (so 2 would be the sum for interval 2/3/2008-3/2/2008). The sum would be zero for interval 1/1/2008-2/2/2008 and would be 1 for the interval 3/3/2008-4/3/2008.
id interval_begin interval_end event_date
1 1/1/2008 2/2/2008 .
1 2/3/2008 3/2/2008 2/15/2008
1 2/3/2008 3/2/2008 2/17/2008
1 3/3/2008 4/3/2008 3/21/2008
1 4/4/2008 5/1/2008 .
2 11/4/2010 12/5/2010 .
2 12/6/2010 1/6/2011 .
2 1/7/2011 2/8/2011 2/4/2011
I have tried doing a simple merge by id but this is what I get.
id interval_begin interval_end event_date
1 1/1/2008 2/2/2008 2/15/2008
1 2/3/2008 3/2/2008 2/17/2008
1 3/3/2008 4/3/2008 3/21/2008
1 4/4/2008 5/1/2008 3/21/2008
2 11/4/2010 12/5/2010 2/4/2011
2 12/6/2010 1/6/2011 2/4/2011
2 1/7/2011 2/8/2011 2/4/2011
What is the best way to get my data into my desired structure so that I can produce the sums that I need? I'm going to have to merge with a 3rd dataset with another date variable but figured if I could get this to work with datasets 1 and 2, I can do this again with the third set. Any help would be most appreciated!
proc sql;
select a.*,b.*
from ds1 a
left join
ds2 b
on b.event_date between a.interval_begin and b.interval_end
and a.id = b.id;
quit;
Many to many merge on a SAS data step does not give the desired results.
Can you add a samples of dataset 1 and a sample of dataset 2 and then the desired output.
Absolutely!
Dataset 1 looks like this:
id interval_begin interval_end var1 var2 var3 var4....... and so forth
1 1/1/2008 2/2/2008
1 2/3/2008 3/2/2008
1 3/3/2008 4/3/2008
1 4/4/2008 5/1/2008
2 11/4/2010 12/5/2010
2 12/6/2010 1/6/2011
2 1/7/2011 2/8/2011
Dataset 2 looks like this:
id event_date var_a var_b var_c...... and so forth
1 2/15/2008
1 2/17/2008
1 3/21/2008
2 2/4/2011
Below is the desired output:
id interval_begin interval_end event_date var1 var2 var3 var_a var_b var_c ...etc.
1 1/1/2008 2/2/2008 .
1 2/3/2008 3/2/2008 2/15/2008
1 2/3/2008 3/2/2008 2/17/2008
1 3/3/2008 4/3/2008 3/21/2008
1 4/4/2008 5/1/2008 .
2 11/4/2010 12/5/2010 .
2 12/6/2010 1/6/2011 .
2 1/7/2011 2/8/2011 2/4/2011
proc sql;
select a.*,b.*
from ds1 a
left join
ds2 b
on b.event_date between a.interval_begin and b.interval_end
and a.id = b.id;
quit;
This worked beautifully, thank you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.