I have a dataset with data that's like the following. Basically flag1 just keeps track if date1 was before an anchor date (begin) or date2 was before another anchor (end) date. The reason that it can get spread out over rows is because the table I am comparing date1 and date2 to have date ranges that are sporadic.
ID Begin End date1 date2 flag1
A 5/1/2007 11/30/2011 6/14/2011 5/14/2012 10
A 12/1/2011 12/31/2011 6/14/2011 5/14/2012 00
A 1/1/2012 9/30/2012 6/14/2011 5/14/2012 01
B 00
C 10
C 00
E 00
E 01
F 11
I didn't want to write out all the dates, the first one is just an example showing how flag1 works. But basically there could be 5 situations:
1. Both date1 and date2 are within Begin and End (11).
2. Neither date1 and date2 are within Begin and End (00).
3. Date1 is within Begin, but Date2 isn't until x rows later.
4. Date1 is within Begin, but Date2 never will be x rows later.
5. Date1 will never be within Begin, but Date2 will be x rows later.
Those first two are easy to deal with, but a situation like A, there could be several rows of 00 between the first.ID and last.ID. Similarly for 4. and 5., there could be several rows before the criteria is met.
The first step would be to only keep 1. and 3. only where both dates are within Begin/End on the same row or different rows. If they are on different rows, I need to find the difference between the previous End date and the next Begin for as many rows necessary and sum the days (if applicable). So for my example
A:
11/30/2011 - 12/1/2011
12/31/2011 - 1/1/2012
I get that in this example there are no gaps, but there could be for other dates in my dataset. I have an inkling using a do loop of some sort along with first.recip_id or last.recip_id would work but I'm at a loss how to do all of this efficienctly.
A few things:
1. I feel like I've seen this question before, so either you've reposted or its been on here recently.
2. You need to expand your example to clarify your rules - more specifically a HAVE and WANT dataset are great.
3. It seems like data structure might be a bit of a problem here, is this your original data structure or did you restructure it this way via a merge or join?
Hi Reeza,
I haven't reposted this question on here but I did ask another one about first.variable and last.variable. I managed to cobble together a "method" but it requires a lot of datasteps - I don't have the code with me at the moment but I will edit my question to be clearer.
To answer the data structure question, it's kind of complicated but originally I had a dataset like this:
ID Anchor_Date
A 1/1/2011
B 1/2/2011
C 1/3/2011
There is a specified date range I have to compare the Anchor_Date to (example 30 days before Anchor_Date and 90 days after Anchor_Date). The dataset that contains this information has sporadic eligibility dates like this:
ID Begin End
A 1/1/2009 1/1/2010
A 1/1/2011 1/1/2012
There is no rule for how the Begin and End dates are organized. It's the way the data is compiled by something I can't control and the information could be spread out over several rows. So the only solution was to establish a date1 (Anchor_date - 30 days) and date2 (Anchor_date + 90 days) then check it against each row of Begin/End date. So I did merge all of this together, and I agree the data structure is iffy.
You can change the table
ID Begin End
A 1/1/2009 1/1/2010
A 1/1/2011 1/1/2012
as
ID Anchor_Date
A 1/1/2009
A 1/2/2009
A 1/3/2009
A 1/4/2009
..........
Then use Hash Table or proc format to check if it is falling the range you want .
Hi,
Thanks for the help...but how do I change the table from a range horizontally to vertically? I'm assuming that it would go through each day between 1/1/2009 until 1/1/2010 as a whole column?
Yes. You are right. That will expand your small into a big table ..
"change the table from a range horizontally to vertically"
do Anchor_Date=Begin to End ;
output;
end;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.