Hello all!,
I am attempting to merge two datasets together. Dataset 1 has two variables that indicate the beginning and end of a date period, which are roughly one month intervals (interval_begin and interval_end). Dataset 2 has dates (event_date) that I want to merge in but that line up wit the date interval (there are also other variables in the row that correspond to that event date that I will need in the row of data). I've tried to illustrate an example below of what I want the data to look like (id is in both datasets). In the example below, rows, 2 and 3 have the same intervals but different event dates. My ultimate goal is to be able to sum the number of unique event dates within that monthly interval (so 2 would be the sum for interval 2/3/2008-3/2/2008). The sum would be zero for interval 1/1/2008-2/2/2008 and would be 1 for the interval 3/3/2008-4/3/2008.
id interval_begin interval_end event_date
1 1/1/2008 2/2/2008 .
1 2/3/2008 3/2/2008 2/15/2008
1 2/3/2008 3/2/2008 2/17/2008
1 3/3/2008 4/3/2008 3/21/2008
1 4/4/2008 5/1/2008 .
2 11/4/2010 12/5/2010 .
2 12/6/2010 1/6/2011 .
2 1/7/2011 2/8/2011 2/4/2011
I have tried doing a simple merge by id but this is what I get.
id interval_begin interval_end event_date
1 1/1/2008 2/2/2008 2/15/2008
1 2/3/2008 3/2/2008 2/17/2008
1 3/3/2008 4/3/2008 3/21/2008
1 4/4/2008 5/1/2008 3/21/2008
2 11/4/2010 12/5/2010 2/4/2011
2 12/6/2010 1/6/2011 2/4/2011
2 1/7/2011 2/8/2011 2/4/2011
What is the best way to get my data into my desired structure so that I can produce the sums that I need? I'm going to have to merge with a 3rd dataset with another date variable but figured if I could get this to work with datasets 1 and 2, I can do this again with the third set. Any help would be most appreciated!
proc sql;
select a.*,b.*
from ds1 a
left join
ds2 b
on b.event_date between a.interval_begin and b.interval_end
and a.id = b.id;
quit;
Many to many merge on a SAS data step does not give the desired results.
Can you add a samples of dataset 1 and a sample of dataset 2 and then the desired output.
Absolutely!
Dataset 1 looks like this:
id interval_begin interval_end var1 var2 var3 var4....... and so forth
1 1/1/2008 2/2/2008
1 2/3/2008 3/2/2008
1 3/3/2008 4/3/2008
1 4/4/2008 5/1/2008
2 11/4/2010 12/5/2010
2 12/6/2010 1/6/2011
2 1/7/2011 2/8/2011
Dataset 2 looks like this:
id event_date var_a var_b var_c...... and so forth
1 2/15/2008
1 2/17/2008
1 3/21/2008
2 2/4/2011
Below is the desired output:
id interval_begin interval_end event_date var1 var2 var3 var_a var_b var_c ...etc.
1 1/1/2008 2/2/2008 .
1 2/3/2008 3/2/2008 2/15/2008
1 2/3/2008 3/2/2008 2/17/2008
1 3/3/2008 4/3/2008 3/21/2008
1 4/4/2008 5/1/2008 .
2 11/4/2010 12/5/2010 .
2 12/6/2010 1/6/2011 .
2 1/7/2011 2/8/2011 2/4/2011
proc sql;
select a.*,b.*
from ds1 a
left join
ds2 b
on b.event_date between a.interval_begin and b.interval_end
and a.id = b.id;
quit;
This worked beautifully, thank you!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.