Hi Everyone,
I have a long dataset (multiple rows per person), and each row has a "start age" and "end age" to indicate the 0.5 year age range that the row is meant to represent. Ie someone would have age_start=20 and age_end=20.5. My LOWEST age_start is 17.5 and my highest age_end is 93, so there is at least one person for each 0.5 year increment between 17.5 and 93.
I am wanting to generate time indicators based on these age ranges so that I can align everyone on the basis of age. Some people have gaps in visit (note, ID=2 is missing visit=3, so there is a gap!). Basically starting with age_start=17.5 I would want some indicator variable (let's say age_X) to be age_X=0 and then increase by 1 for each progressive 0.5 years (ie for age_start=18 age_0=0 and age_1=1). So each row should only have ONE age_x variable that is 1 and all the others would be 0. There should be 151 indicator variables generated from this for each 0.5 year range between 17.5 and 93.
I think an array would be the best approach here, but was unsure how to get it to be based specifically on the age_start variable since each person can start at a different age and there can be gaps. Any help is appreciated!
An example of the data I have is below:
Data have;
input ID visit age_start age_end ;
datalines;1 1 17.5 18
1 2 18 18.5
1 3 18.5 19
2 1 25 25.5
2 2 25.5 26
2 4 26.5 27
3 1 51.5 52
3 2 52 52.5
3 3 52.5 53;
run;
Data I WANT:
id visit age_start age_end age0 age1 age2..... age15 age16 age17 age18......age68 age69 ag70.......
1 1 17.5 18 1 0 0 0 0 0 0 0 0 0
1 2 18 18.5 0 1 0 0 0 0 0 0 0 0
1 3 18.5 19 0 0 1 0 0 0 0 0 0 0
2 1 25 25.5 0 0 0 1 0 0 0 0 0 0
2 2 25.5 26 0 0 0 0 1 0 0 0 0 0
2 4 26.5 27 0 0 0 0 0 0 1 0 0 0
3 1 51.5 52 0 0 0 0 0 0 0 1 0 0
3 2 52 52.5 0 0 0 0 0 0 0 0 1 0
3 3 52.5 53 0 0 0 0 0 0 0 0 0 1
My first question would be why do you need those dummy variables? Procedures with a CLASS statement will generate the dummies you need internally.
Data have; input ID visit age_start age_end ; array age{151} age1-age151; do i = 1 to 151; if i = (age_start-17)*2 then age{i} = 1; else age{i} = 0; end; drop i; datalines; 1 1 17.5 18 1 2 18 18.5 1 3 18.5 19 2 1 25 25.5 2 2 25.5 26 2 4 26.5 27 3 1 51.5 52 3 2 52 52.5 3 3 52.5 53 ; run;
I was needing to export my data to run analyses in Stata so thought I needed to generate the indicator variables for it. I tried running the code you gave and ran into an issue where the data step wasn't finishing and eventually had to break it.
@bgosiker - the program as posted by @arthurcavila works fine for me. You have obviously changed it so post the code you are trying to run.
Doesn't STATA generate indicator variables when you try to fit a model? Doesn't SAS give the ability to run the same models?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.