Re: how to sort dataset with flags for specific conditions and calcula...

appleorange · Posted 04-10-2015 02:10 PM

I have a dataset with data that's like the following. Basically flag1 just keeps track if date1 was before an anchor date (begin) or date2 was before another anchor (end) date. The reason that it can get spread out over rows is because the table I am comparing date1 and date2 to have date ranges that are sporadic.

    ID Begin     End         date1       date2       flag1
    A   5/1/2007 11/30/2011 6/14/2011   5/14/2012   10
    A   12/1/2011 12/31/2011 6/14/2011   5/14/2012   00
    A   1/1/2012 9/30/2012   6/14/2011   5/14/2012   01
    B                                                 00
    C                                                 10
    C                                                 00
    E                                                 00
    E                                                 01
    F                                                 11

I didn't want to write out all the dates, the first one is just an example showing how flag1 works. But basically there could be 5 situations:

1. Both date1 and date2 are within Begin and End (11).
2. Neither date1 and date2 are within Begin and End (00).
3. Date1 is within Begin, but Date2 isn't until x rows later.
4. Date1 is within Begin, but Date2 never will be x rows later.
5. Date1 will never be within Begin, but Date2 will be x rows later.

Those first two are easy to deal with, but a situation like A, there could be several rows of 00 between the first.ID and last.ID. Similarly for 4. and 5., there could be several rows before the criteria is met.

The first step would be to only keep 1. and 3. only where both dates are within Begin/End on the same row or different rows. If they are on different rows, I need to find the difference between the previous End date and the next Begin for as many rows necessary and sum the days (if applicable). So for my example

    A:

    11/30/2011 - 12/1/2011
    12/31/2011 - 1/1/2012

I get that in this example there are no gaps, but there could be for other dates in my dataset. I have an inkling using a do loop of some sort along with first.recip_id or last.recip_id would work but I'm at a loss how to do all of this efficienctly.

Reeza · Posted 04-10-2015 04:36 PM

A few things:

1. I feel like I've seen this question before, so either you've reposted or its been on here recently.

2. You need to expand your example to clarify your rules - more specifically a HAVE and WANT dataset are great.

3. It seems like data structure might be a bit of a problem here, is this your original data structure or did you restructure it this way via a merge or join?

appleorange · Posted 04-10-2015 11:22 PM

Hi Reeza,

I haven't reposted this question on here but I did ask another one about first.variable and last.variable. I managed to cobble together a "method" but it requires a lot of datasteps - I don't have the code with me at the moment but I will edit my question to be clearer.

To answer the data structure question, it's kind of complicated but originally I had a dataset like this:

ID Anchor_Date

A 1/1/2011

B 1/2/2011

C 1/3/2011

There is a specified date range I have to compare the Anchor_Date to (example 30 days before Anchor_Date and 90 days after Anchor_Date). The dataset that contains this information has sporadic eligibility dates like this:

ID Begin End

A 1/1/2009 1/1/2010

A 1/1/2011 1/1/2012

There is no rule for how the Begin and End dates are organized. It's the way the data is compiled by something I can't control and the information could be spread out over several rows. So the only solution was to establish a date1 (Anchor_date - 30 days) and date2 (Anchor_date + 90 days) then check it against each row of Begin/End date. So I did merge all of this together, and I agree the data structure is iffy.

Ksharp · Posted 04-12-2015 02:06 AM

You can change the table

ID Begin End

A 1/1/2009 1/1/2010

A 1/1/2011 1/1/2012

as

ID Anchor_Date

A 1/1/2009

A 1/2/2009

A 1/3/2009

A 1/4/2009

..........

Then use Hash Table or proc format to check if it is falling the range you want .

appleorange · Posted 04-12-2015 06:53 PM

Hi,

Thanks for the help...but how do I change the table from a range horizontally to vertically? I'm assuming that it would go through each day between 1/1/2009 until 1/1/2010 as a whole column?

Ksharp · Posted 04-13-2015 07:53 AM

Yes. You are right. That will expand your small into a big table ..

"change the table from a range horizontally to vertically"

do Anchor_Date=Begin to End ;

output;

end;

how to sort dataset with flags for specific conditions and calculate gaps

Re: how to sort dataset with flags for specific conditions and calculate gaps

Re: how to sort dataset with flags for specific conditions and calculate gaps

Re: how to sort dataset with flags for specific conditions and calculate gaps

Re: how to sort dataset with flags for specific conditions and calculate gaps

Re: how to sort dataset with flags for specific conditions and calculate gaps

how to sort dataset with flags for specific conditions and calculate gaps

Re: how to sort dataset with flags for specific conditions and calculate gaps

Re: how to sort dataset with flags for specific conditions and calculate gaps

Re: how to sort dataset with flags for specific conditions and calculate gaps

Re: how to sort dataset with flags for specific conditions and calculate gaps

Re: how to sort dataset with flags for specific conditions and calculate gaps

Registration is open

SAS Training: Just a Click Away