Solved
Contributor
Posts: 20

First and Last date in a wide format data set

I am trying to capture the first and last dates in a dataset where I have enrollment indicators for each month 1-12 labeled enrind{i} : enrind1-enrind12, month 1 being January, month 2 being, feb, etc. Each enrind{i} = 0 or 1 depending on whether or not they were enrolled.

I am familiar with how to go about this when the data is in a long format but this data set is quite massive and I would rather not transpose it if I don't have to.

What I want to do is to create two variables, first enrollment month and last enrollment month. Call them first_mo and last_mo. The contents of the variables would be the first and last month of enrollment, so a number 1-12 based on the enrollment indicator month. So far I am thinking of this:

data want

set have

array enrind{i} \$ enrind1-enrind12;

first_mo = 0;

do i = 1 to 12;

if enrind{i} = 1 then return ?

I am not sure what to have SAS return to give me the value of the variable.

If I am on the right path let me know how I may proceed. If I am not please let me know what other ways I can perform this. I imagine proc sql would be involved...

Thank You Very Much

Accepted Solutions
Solution
‎03-17-2014 02:03 PM
Super User
Posts: 23,683

Re: First and Last date in a wide format data set

What do you mean by continuous enrollment?

Post sample data and expected output.

Here's a sample based on your current requirements, you'll need to adjust it for the case of all 0's

data have;

input id month1-month12;

cards;

1 0 0 1 1 1 1 0 0 0 0 0 0

2 0 0 1 1 1 1 1 1 1 0 0 0

3 1 1 1 1 1 1 1 0 0 0 0 0

4 0 0 0 0 0 0 0 0 0 0 0 0

5 1 1 1 1 1 1 1 1 1 1 1 1

;

run;

data want;

set have;

array month(12);

sample=catt(of month(*));

first_one=index( catt(of month(*)), "1");

last_one=12-index(reverse(catt(of month(*))), "1")+1;

run;

All Replies
Super User
Posts: 23,683

Re: First and Last date in a wide format data set

What happens if they have discontinuous enrollment,

ie

1 1 0 0 1 0 0 0

Contributor
Posts: 20

Re: First and Last date in a wide format data set

My overarching goal here is to create a continuous enrollment. This is just one of the steps in the process (I think, never done it before but it's in the instructions I was given.) Getting the first_mo was easy:

first_mo = 0;

do i = 1 to 12 until (first_mo=i);

if enrind{i} = 1 then first_mo = i;

end;

But now the discontinuous enrollment is a problem when trying to find the last_mo "last date"

If you or someone else knows how to make a continuous enrollment I'd certainly appreciate the input

Solution
‎03-17-2014 02:03 PM
Super User
Posts: 23,683

Re: First and Last date in a wide format data set

What do you mean by continuous enrollment?

Post sample data and expected output.

Here's a sample based on your current requirements, you'll need to adjust it for the case of all 0's

data have;

input id month1-month12;

cards;

1 0 0 1 1 1 1 0 0 0 0 0 0

2 0 0 1 1 1 1 1 1 1 0 0 0

3 1 1 1 1 1 1 1 0 0 0 0 0

4 0 0 0 0 0 0 0 0 0 0 0 0

5 1 1 1 1 1 1 1 1 1 1 1 1

;

run;

data want;

set have;

array month(12);

sample=catt(of month(*));

first_one=index( catt(of month(*)), "1");

last_one=12-index(reverse(catt(of month(*))), "1")+1;

run;

Contributor
Posts: 20

Re: First and Last date in a wide format data set

I prematurely marked this as correct. It is correct for the most part but unfortunately there is one case where this fails. For instance, the following case:

111111101111

111100111111

Reads as the first month being 1 and last being 12 . Same for the second case. My goal would be to have this read as the first month = 1 and last month = 7 for the first case and first month = 1 and last month = 4 for the second case.

so, last_mo = 12-index(reverse(catt(of enrind{*})), "1"), + 1 should be modified so that the first 0 it sees in a string of 1s reading from the left, take that and subtract 1 giving me that last month enrolled. I'm not sure how to do this.

Posts: 3,167

Re: First and Last date in a wide format data set

Similar to Reeza's solution:

data want;

set have;

length cat \$ 100;

array mon mon:;

do over mon;

if mon=1 then cat=catx('_',cat,_i_);

end;

first_mo=input(scan(cat,1,'_'),2.);

last_mo=input(scan(cat,-1,'_'),2.);

run;

Contributor
Posts: 20

Re: First and Last date in a wide format data set

The plot thickens. I am trying to construct episodes of coverage for patients in an insurance database. My episodes of coverage are continuous periods of time that a patient was enrolled in an insurance plan. This is what I have:

ID   YEAR   ENRIND1-ENRIND12    MEMDAYS1-MEMDAYS12

101 2008    0 0 1 1 1 1 1 1 1 1 1 1   0 0 31 30……………..31

101 2009    1 1 1 1 1 0 0 1 1 1 1 1   31 28…0 0……………31

101 2010    1 1 1 1 1 1 1 1 1 0 0 0   31 28………………0 0 0

What I want to make is two episodes of continuous coverage for this:

ID     FIRST1     LAST1      FIRST2      LAST2      …  FIRSTN    LASTN

101  2/1/2008  5/31/2009    8/1/2009   9/30/2010

The MEMDAYS1-12 variables might be necessary to create the day for LAST1-N date variables. The FIRST1-N will always start with the first of the month.

Complications being that the duplications for IDs are necessary for other parts of the analysis and I think it will be quite fine to end up with something like this:

ID     FIRST1     LAST1      FIRST2      LAST2      …  FIRSTN    LASTN

101  2/1/2008  5/31/2009    8/1/2009   9/30/2010

101  2/1/2008  5/31/2009    8/1/2009   9/30/2010

101  2/1/2008  5/31/2009    8/1/2009   9/30/2010

I added the FIRSTN LASTN because some people may have > 2 periods of continuous coverage.

Any assistance here is appreciated, I only wish I could repay in some way.