Hi Everyone,
I have a long dataset (multiple rows per person), and each row has a "start age" and "end age" to indicate the 0.5 year age range that the row is meant to represent. Ie someone would have age_start=20 and age_end=20.5. My LOWEST age_start is 17.5 and my highest age_end is 93, so there is at least one person for each 0.5 year increment between 17.5 and 93.
I am wanting to generate time indicators based on these age ranges so that I can align everyone on the basis of age. Some people have gaps in visit (note, ID=2 is missing visit=3, so there is a gap!). Basically starting with age_start=17.5 I would want some indicator variable (let's say age_X) to be age_X=0 and then increase by 1 for each progressive 0.5 years (ie for age_start=18 age_0=0 and age_1=1). So each row should only have ONE age_x variable that is 1 and all the others would be 0. There should be 151 indicator variables generated from this for each 0.5 year range between 17.5 and 93.
I think an array would be the best approach here, but was unsure how to get it to be based specifically on the age_start variable since each person can start at a different age and there can be gaps. Any help is appreciated!
An example of the data I have is below:
Data have;
input ID visit age_start age_end ;
datalines;1 1 17.5 18
1 2 18 18.5
1 3 18.5 19
2 1 25 25.5
2 2 25.5 26
2 4 26.5 27
3 1 51.5 52
3 2 52 52.5
3 3 52.5 53;
run;
Data I WANT:
id visit age_start age_end age0 age1 age2..... age15 age16 age17 age18......age68 age69 ag70.......
1 1 17.5 18 1 0 0 0 0 0 0 0 0 0
1 2 18 18.5 0 1 0 0 0 0 0 0 0 0
1 3 18.5 19 0 0 1 0 0 0 0 0 0 0
2 1 25 25.5 0 0 0 1 0 0 0 0 0 0
2 2 25.5 26 0 0 0 0 1 0 0 0 0 0
2 4 26.5 27 0 0 0 0 0 0 1 0 0 0
3 1 51.5 52 0 0 0 0 0 0 0 1 0 0
3 2 52 52.5 0 0 0 0 0 0 0 0 1 0
3 3 52.5 53 0 0 0 0 0 0 0 0 0 1
My first question would be why do you need those dummy variables? Procedures with a CLASS statement will generate the dummies you need internally.
Data have; input ID visit age_start age_end ; array age{151} age1-age151; do i = 1 to 151; if i = (age_start-17)*2 then age{i} = 1; else age{i} = 0; end; drop i; datalines; 1 1 17.5 18 1 2 18 18.5 1 3 18.5 19 2 1 25 25.5 2 2 25.5 26 2 4 26.5 27 3 1 51.5 52 3 2 52 52.5 3 3 52.5 53 ; run;
I was needing to export my data to run analyses in Stata so thought I needed to generate the indicator variables for it. I tried running the code you gave and ran into an issue where the data step wasn't finishing and eventually had to break it.
@bgosiker - the program as posted by @arthurcavila works fine for me. You have obviously changed it so post the code you are trying to run.
Doesn't STATA generate indicator variables when you try to fit a model? Doesn't SAS give the ability to run the same models?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.