BookmarkSubscribeRSS Feed
chuakp
Obsidian | Level 7

I am trying to ascertain month of birth from an insurance claims enrollment file in long format.  AGE refers to the age at the beginning of the month.  The basic structure is like this:

 

ID       MONTH        AGE

1         1                   0

1         2                   0

1         3                   1

1         4                   1

1         5                   1

1         6                   1

1         7                   1

1         8                   1

1         9                   1

1        10                  1

1        11                  1

1        12                  1

 

This person was born in February because they were 0 years of age in February (month 2) and 1 year of age in March (month 3).  I have been trying unsuccessfuly to use some combination of "by id" and "retain" statements in a DATA step to get SAS to set the month of birth to the first instance in which AGE changes values.  Any suggestions?  Thanks.

7 REPLIES 7
PGStats
Opal | Level 21

Assuming you also have YEAR in your data  

 

data birthdays;
set have; by id year age;
if first.age then output;
run;

proc sql;
create table birthMonths as
select unique 
	id, 
	intnx( "MONTH", mdy(month-1, 1, year-age), -1) as birthMonth format=mmyyd.
from birthdays;
quit;

 

Check for IDs that get more than one record.

 

(untested)

PG
Steelers_In_DC
Barite | Level 11

I'm not sure if you are looking for the word February or the number 2, here's an examle that has both:

 

data have;
input ID       MONTH        AGE;
cards;
1         1                  0
1         2                  0
1         3                  1
1         4                  1
1         5                  1
1         6                  1
1         7                  1
1         8                  1
1         9                  1
1        10                  1
1        11                  1
1        12                  1
;

proc format;
 value new   1 = 'January'
              2 = 'February'
             3 = 'March'
             4 = 'April'
             5 = 'May'
             6 = 'June'
             7 = 'July'
             8 = 'August'
             9 = 'September'
            10 = 'October'
            11 = 'November'
            12 = 'December'
    ;

data want;
set have;
by id age;
if last.age and age = 0 then do;
Birth_Month = month;
Month_Birth = month;
format birth_month new.;
end;
run;

ballardw
Super User

Without year I find this very odd but a different approach if your month variable is numeric:

 

proc summary data=have nway;

where age=0;

class id;

var month;

output out=want (drop=_type_ _freq_) max=BirthMonth;

run;

 

data have;

merge have want;

by id;

run;

chuakp
Obsidian | Level 7

Thanks for these responses.  I realized I didn't do a good job of illustrating the complexity of this problem.  There are four types of people that require different programming appraoches:

 

1) There are some individuals like ID 1 who are in the dataset for 12 months and whose AGE changes mid-year.  Since AGE refers to age at the beginning of the month, ID 1 was born February 2-March 1, 2012 since he/she was born in 2012, 0 years old on February 1, 2013, and 1 year old on March 1, 2013.  For the purposes of this analysis, I'm going to assign them to the birth month of February (want BIRTHMONTH = 2)

 

2) There are some individuals like ID 2 who are in the dataset for 12 months and whose AGE does not change mid-year.  ID 2 was born December  2 - December 31, 2011 because he/she was born in 2011 and was 2 years old at the beginning of every month of 2013 (want BIRTHMONTH = 12).

 

3) Some people like ID 3 were clearly not births and were enrolled in the insurance plan for only 5 months.  I'd actually like to kick these person out of the dataset since I'm interested in people who were continuously enrolled for all 12 months (excluding births).

 

4) ID 4 is a 0-year old who entered into the dataset in August 2013.  While it's possible that this person was born earlier in the year and only appeared in the dataset because they switched to this insurance plan in August, I'm going to assume this person was born in August (want BIRTHMONTH = 😎 - I'll check this assumption later by looking for a birth-related claim.

 

If I just had date of birth or even month of birth, none of these contortions would be necessary, but unfortunately this is what I'm faced with.  Thanks.

 

data have;
input ID MONTH AGE BIRTHYEAR CURRENTYEAR;
cards;
1 1 0 2012 2013
1 2 0 2012 2013
1 3 1 2012 2013
1 4 1 2012 2013
1 5 1 2012 2013
1 6 1 2012 2013
1 7 1 2012 2013
1 8 1 2012 2013
1 9 1 2012 2013
1 10 1 2012 2013
1 11 1 2012 2013
1 12 1 2012 2013
2 1 2 2011 2013
2 2 2 2011 2013
2 3 2 2011 2013
2 4 2 2011 2013
2 5 2 2011 2013
2 6 2 2011 2013
2 7 2 2011 2013
2 8 2 2011 2013
2 9 2 2011 2013
2 10 2 2011 2013
2 11 2 2011 2013
2 12 2 2011 2013
3 1 4 2010 2013
3 2 4 2010 2013
3 3 4 2010 2013
3 4 4 2010 2013
3 5 4 2010 2013
4 8 0 2013 2013
4 9 0 2013 2013
4 10 0 2013 2013
4 11 0 2013 2013
4 12 0 2013 2013

;
run;

Steelers_In_DC
Barite | Level 11

This turned into a real crap-fest.  It's not what I would call "easy to look at" but here's a solution:

 

data want other;
do until(last.id);
set have;
by id age;
/*Scenario 1*/
if not first.id then do;
if last.age and lag2(age) ne age then birth_month = month;
end;
/*Scenario 2*/
retain _jan_age;
if month = 1 then _jan_age = age;
if month = 12 and age = _jan_age then birth_month = month;
/*Scenario 4*/
retain _first_id _first_month;
if first.id and month ne 1 then _first_month = month;
if first.id and month = 1 then _first_month = .;
if first.id then _first_id = age;
if last.id then _last_id = age;
if month = 12 and not missing(_first_month) then Flag = 'Research';
end;
/*Roles up birth_month and research to the id level*/
do until(last.id);
set have;
by id age;
if flag = 'Research' or not missing(birth_month) then output want;
/*Scenario 3*/
else output other;
end;
drop _:;
run;

 

chuakp
Obsidian | Level 7

Thanks for looking at this - I really appreciate it.  Let me try it on my dataset.

chuakp
Obsidian | Level 7

 

There seems to be an error since the only three values of birthmonth from the output dataset are 2, 3, and 12.  I think this has to do with the fact that the code uses lag2 instead of all possible values of lag like lag3, lag4, etc. 

 

 

Focusing on scenario 1 and 2 only for the moment, I'm trying the following code that manually compares every age to January age; if they are not equivalent, birth month is set to that month and then the loop should exit.  However, this code results in an infinite loop.

 

 

data want; set have;

by id;
retain _jan_age;

if month = 1 then _jan_age = age;

birthmonth = .;

do while(birthmonth = .);

     if age NE _jan_age then birthmonth = month - 1;
     if month = 12 and age = _jan_age then birthmonth = 12;
end;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1035 views
  • 1 like
  • 4 in conversation