BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mphqueens
Fluorite | Level 6

I am working with a data set where I only have the variables "BIRTHYEAR" in year formatted best32. and "TESTDATE" formatted yymmdd10.

 

I need to describe the data based on age and would like to create age cohorts. However, since the data captures 15 years of testing, the only way to accurately describe age would be the test date year minus the birth year. How would I code this?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Minimize the age error with:

 

age = yrdif(mdy(7, 1, birthyear), testdate, "AGE");

 

it should be exact about half the time and never be off by more than one year.

PG

View solution in original post

7 REPLIES 7
mphqueens
Fluorite | Level 6

Hey all,

 

I am working with a data set where I only have the variables "BIRTHYEAR" in year formatted best32. and "TESTDATE" formatted yymmdd10.

 

I need to describe the data based on age and would like to create age cohorts. However, since the data captures 15 years of testing, the only way to accurately describe age would be the test date year minus the birth year. How would I code this?

 

Thanks!

ballardw
Super User

Not saying this is the best but would give an upper bound on calendar age

 

data want;
   set have;
   age = intck('year',mdy(1,1,Birthyear),testdate);
run;
Reeza
Super User

 

I would simply recommend using the year as the variable and groupoing according to that.

 

Also, you may want to look at some demographic methods, there are corrections for when you approximate age as a continuous variable to be an interval variable, depending on what type of statistics you're trying to measure.

 

 

PGStats
Opal | Level 21

Minimize the age error with:

 

age = yrdif(mdy(7, 1, birthyear), testdate, "AGE");

 

it should be exact about half the time and never be off by more than one year.

PG
mphqueens
Fluorite | Level 6

Thank you! When I run these, I get some values such as age = -4. I believe that some of the lines do not have test date or birth year values inputted. How would I deal with this?

PGStats
Opal | Level 21

missing birth years or test dates should generate missing ages. Can't say more without seeing data.

PG
ballardw
Super User

@mphqueens wrote:

Thank you! When I run these, I get some values such as age = -4. I believe that some of the lines do not have test date or birth year values inputted. How would I deal with this?


 

Likely to be a data entry issue. If the birth year is after the data of a test there is something fishy.

 

Data problem;

   set have;

   where birthyear > year(testdate);

run;

 

You may want to drop non-identification variables above just to look at problem records.

 

I have some inherited data from 1994 that still has "test dates" in the future. There are 6 records that the tests will occur in 2021 or 2020. We also people getting tested as early as the 11th century (only 979 years before birth) and several hundred where the test was before birth by varying amounts.

 

So check your raw data, see if some of the values can be corrected and prepare to remove some from your analysis set.

One of the first things I do with any dataset with dates is to see if they may sense. Some additional checks besides the relational birth should be before test are: Birth year in expected range? Test date in explected range (should not be greater than today unless it is "scheduled to be tested" date. You may check to see if any of your test dates are missing due to conversion problems. If you read a text file with an improbable date: day 31 in month with only 30 days, leap day not in leap year, Feb 30 or 31, year prior to 1581 or post 19999 (largest my data sources have thrown to me was 19983) and such. If truly paranoid you might verify that the day of the week for the test date makes sense or did not occur on a holiday if that should not happen.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2818 views
  • 1 like
  • 4 in conversation