I am trying to ascertain month of birth from an insurance claims enrollment file in long format. AGE refers to the age at the beginning of the month. The basic structure is like this:
ID MONTH AGE
1 1 0
1 2 0
1 3 1
1 4 1
1 5 1
1 6 1
1 7 1
1 8 1
1 9 1
1 10 1
1 11 1
1 12 1
This person was born in February because they were 0 years of age in February (month 2) and 1 year of age in March (month 3). I have been trying unsuccessfuly to use some combination of "by id" and "retain" statements in a DATA step to get SAS to set the month of birth to the first instance in which AGE changes values. Any suggestions? Thanks.
Assuming you also have YEAR in your data
data birthdays;
set have; by id year age;
if first.age then output;
run;
proc sql;
create table birthMonths as
select unique
id,
intnx( "MONTH", mdy(month-1, 1, year-age), -1) as birthMonth format=mmyyd.
from birthdays;
quit;
Check for IDs that get more than one record.
(untested)
I'm not sure if you are looking for the word February or the number 2, here's an examle that has both:
data have;
input ID MONTH AGE;
cards;
1 1 0
1 2 0
1 3 1
1 4 1
1 5 1
1 6 1
1 7 1
1 8 1
1 9 1
1 10 1
1 11 1
1 12 1
;
proc format;
value new 1 = 'January'
2 = 'February'
3 = 'March'
4 = 'April'
5 = 'May'
6 = 'June'
7 = 'July'
8 = 'August'
9 = 'September'
10 = 'October'
11 = 'November'
12 = 'December'
;
data want;
set have;
by id age;
if last.age and age = 0 then do;
Birth_Month = month;
Month_Birth = month;
format birth_month new.;
end;
run;
Without year I find this very odd but a different approach if your month variable is numeric:
proc summary data=have nway;
where age=0;
class id;
var month;
output out=want (drop=_type_ _freq_) max=BirthMonth;
run;
data have;
merge have want;
by id;
run;
Thanks for these responses. I realized I didn't do a good job of illustrating the complexity of this problem. There are four types of people that require different programming appraoches:
1) There are some individuals like ID 1 who are in the dataset for 12 months and whose AGE changes mid-year. Since AGE refers to age at the beginning of the month, ID 1 was born February 2-March 1, 2012 since he/she was born in 2012, 0 years old on February 1, 2013, and 1 year old on March 1, 2013. For the purposes of this analysis, I'm going to assign them to the birth month of February (want BIRTHMONTH = 2)
2) There are some individuals like ID 2 who are in the dataset for 12 months and whose AGE does not change mid-year. ID 2 was born December 2 - December 31, 2011 because he/she was born in 2011 and was 2 years old at the beginning of every month of 2013 (want BIRTHMONTH = 12).
3) Some people like ID 3 were clearly not births and were enrolled in the insurance plan for only 5 months. I'd actually like to kick these person out of the dataset since I'm interested in people who were continuously enrolled for all 12 months (excluding births).
4) ID 4 is a 0-year old who entered into the dataset in August 2013. While it's possible that this person was born earlier in the year and only appeared in the dataset because they switched to this insurance plan in August, I'm going to assume this person was born in August (want BIRTHMONTH = 😎 - I'll check this assumption later by looking for a birth-related claim.
If I just had date of birth or even month of birth, none of these contortions would be necessary, but unfortunately this is what I'm faced with. Thanks.
data have;
input ID MONTH AGE BIRTHYEAR CURRENTYEAR;
cards;
1 1 0 2012 2013
1 2 0 2012 2013
1 3 1 2012 2013
1 4 1 2012 2013
1 5 1 2012 2013
1 6 1 2012 2013
1 7 1 2012 2013
1 8 1 2012 2013
1 9 1 2012 2013
1 10 1 2012 2013
1 11 1 2012 2013
1 12 1 2012 2013
2 1 2 2011 2013
2 2 2 2011 2013
2 3 2 2011 2013
2 4 2 2011 2013
2 5 2 2011 2013
2 6 2 2011 2013
2 7 2 2011 2013
2 8 2 2011 2013
2 9 2 2011 2013
2 10 2 2011 2013
2 11 2 2011 2013
2 12 2 2011 2013
3 1 4 2010 2013
3 2 4 2010 2013
3 3 4 2010 2013
3 4 4 2010 2013
3 5 4 2010 2013
4 8 0 2013 2013
4 9 0 2013 2013
4 10 0 2013 2013
4 11 0 2013 2013
4 12 0 2013 2013
;
run;
This turned into a real crap-fest. It's not what I would call "easy to look at" but here's a solution:
data want other;
do until(last.id);
set have;
by id age;
/*Scenario 1*/
if not first.id then do;
if last.age and lag2(age) ne age then birth_month = month;
end;
/*Scenario 2*/
retain _jan_age;
if month = 1 then _jan_age = age;
if month = 12 and age = _jan_age then birth_month = month;
/*Scenario 4*/
retain _first_id _first_month;
if first.id and month ne 1 then _first_month = month;
if first.id and month = 1 then _first_month = .;
if first.id then _first_id = age;
if last.id then _last_id = age;
if month = 12 and not missing(_first_month) then Flag = 'Research';
end;
/*Roles up birth_month and research to the id level*/
do until(last.id);
set have;
by id age;
if flag = 'Research' or not missing(birth_month) then output want;
/*Scenario 3*/
else output other;
end;
drop _:;
run;
Thanks for looking at this - I really appreciate it. Let me try it on my dataset.
There seems to be an error since the only three values of birthmonth from the output dataset are 2, 3, and 12. I think this has to do with the fact that the code uses lag2 instead of all possible values of lag like lag3, lag4, etc.
Focusing on scenario 1 and 2 only for the moment, I'm trying the following code that manually compares every age to January age; if they are not equivalent, birth month is set to that month and then the loop should exit. However, this code results in an infinite loop.
data want; set have;
by id;
retain _jan_age;
if month = 1 then _jan_age = age;
birthmonth = .;
do while(birthmonth = .);
if age NE _jan_age then birthmonth = month - 1;
if month = 12 and age = _jan_age then birthmonth = 12;
end;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.