I am working with a data set where I only have the variables "BIRTHYEAR" in year formatted best32. and "TESTDATE" formatted yymmdd10.
I need to describe the data based on age and would like to create age cohorts. However, since the data captures 15 years of testing, the only way to accurately describe age would be the test date year minus the birth year. How would I code this?
Thanks!
Minimize the age error with:
age = yrdif(mdy(7, 1, birthyear), testdate, "AGE");
it should be exact about half the time and never be off by more than one year.
Hey all,
I am working with a data set where I only have the variables "BIRTHYEAR" in year formatted best32. and "TESTDATE" formatted yymmdd10.
I need to describe the data based on age and would like to create age cohorts. However, since the data captures 15 years of testing, the only way to accurately describe age would be the test date year minus the birth year. How would I code this?
Thanks!
Not saying this is the best but would give an upper bound on calendar age
data want;
set have;
age = intck('year',mdy(1,1,Birthyear),testdate);
run;
I would simply recommend using the year as the variable and groupoing according to that.
Also, you may want to look at some demographic methods, there are corrections for when you approximate age as a continuous variable to be an interval variable, depending on what type of statistics you're trying to measure.
Minimize the age error with:
age = yrdif(mdy(7, 1, birthyear), testdate, "AGE");
it should be exact about half the time and never be off by more than one year.
Thank you! When I run these, I get some values such as age = -4. I believe that some of the lines do not have test date or birth year values inputted. How would I deal with this?
missing birth years or test dates should generate missing ages. Can't say more without seeing data.
@mphqueens wrote:
Thank you! When I run these, I get some values such as age = -4. I believe that some of the lines do not have test date or birth year values inputted. How would I deal with this?
Likely to be a data entry issue. If the birth year is after the data of a test there is something fishy.
Data problem;
set have;
where birthyear > year(testdate);
run;
You may want to drop non-identification variables above just to look at problem records.
I have some inherited data from 1994 that still has "test dates" in the future. There are 6 records that the tests will occur in 2021 or 2020. We also people getting tested as early as the 11th century (only 979 years before birth) and several hundred where the test was before birth by varying amounts.
So check your raw data, see if some of the values can be corrected and prepare to remove some from your analysis set.
One of the first things I do with any dataset with dates is to see if they may sense. Some additional checks besides the relational birth should be before test are: Birth year in expected range? Test date in explected range (should not be greater than today unless it is "scheduled to be tested" date. You may check to see if any of your test dates are missing due to conversion problems. If you read a text file with an improbable date: day 31 in month with only 30 days, leap day not in leap year, Feb 30 or 31, year prior to 1581 or post 19999 (largest my data sources have thrown to me was 19983) and such. If truly paranoid you might verify that the day of the week for the test date makes sense or did not occur on a holiday if that should not happen.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.