BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Adam1
Calcite | Level 5

Hello everyone!

I've been struggling for a few days to solve the this task.

The setting/study

- Observational data. Patients with Crohns Disease. Data was collected annually during 2002–2013 in a large region in Sweden.

- Patients can be included any year and visits may be irregular on a annual basis (some patients might visit the clinic every year, other come only one time during the study period).

- I know the exact day of death for each patient. VARIABLE: DEATH_YEAR

- I know the exact day of relapse (one of the endpoints of interest). VARIABLE: RELAPSE_YEAR

I am interested in the incidence of relapse and I need to calculate the number of relapses each year divided by the number of individuals alive that year. Now the problem is that from inclusion, individuals come irregularly, but I do know if they are actually alive that year and if they have experienced a relapse.

I could solve this (if I only could...) if I could create 12 new variables for each patient. Each new variable should be the calendar year and this variable should be set to '1' if the patient is alive that year.

Thus the problem is that i need to create a 'year-variables' that are set to '1' for each year at inclusion and thereafter, given that the person is not dead, or has experienced the event.

An example:

Patient X was included 2005 and died 2009. For him I would need he following variables: '2005', '2006', '2007', '2008' and '2009' set to '1'.

Patient Y was included 2005 and experienced event 2007. For him I would need the following variables: '2005', '2006', 2007' set to '1'. (Yes, year of event/death need still be set to '1').

I would be extremely grateful for any advice on this!

Thanks in advance!

/adam

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

The code requires some minor modifications.

1. Change the start year to your start year variable, first_visit

2. Remove the initializing of the variable, so it isn't set to 0 by default.

3. Calculate the end of the loop/indicator by taking the minimum of 2013, the relapse year and death year.

data want;

set have;

array year_indicator(2002:2013) year2002-year2013; *create an array that uses the index 2002 to 2013 and is initialized to 0;

year_end=min(2012, relapse_year, death_year);

do year=first_visit to year_end;

year_indicator(year)=1;

end;

run;

View solution in original post

4 REPLIES 4
Reeza
Super User

Arrays are your friend. I think this gets you closer but not 100% sure what you're after, its often easiest if you post a small sample of what your data looks like and what the output should look like.

data want;

set have;

array year_indicator(2002:2013) year2002-year2013 (0*12); *create an array that uses the index 2002 to 2013 and is initialized to 0;

do year=relapse_year to death_year;

year_indicator(year)=1;

end;

run;

Adam1
Calcite | Level 5

Thank You Reeza for Your effort.

Unfortunately it seems that I did not explain the task good enough.

I tried Your array, and also modify it in several ways, without any success but thats probably just me.

Here is an example data set with three patients.

data have;

input patient visit first_visit relapse_year death_year;

datalines;

1 2003 2003 . 2010

1 2004 2003 . 2010

1 2009 2003 . 2010

2 2002 2002 2006 .

2 2006 2002 2006 .

2 2006 2002 2006 .

2 2008 2002 2006 .

2 2012 2002 2006 .

3 2004 2004 . .

3 2008 2004 . .

3 2008 2004 . .

;

run;

The year variable, which You have called year2002-2012 should be set to '1' between the patients first visit year and the last year; the last year is either 2012 (if patient did not die, nor experienced event) or the year of death/event. The array should therefore lead to following results (computed variables are bold):

patientvisitfirst_visitrelapse_yeardeath_yearYEAR2002YEAR2003YEAR2004YEAR2005YEAR2006YEAR2007YEAR2008YEAR2009YEAR2010YEAR2011YEAR2012
120032003.2010.11111111..
120042003.2010.11111111..
120092003.2010.11111111..
2200220022006.11111......
2200620022006.11111......
2200620022006.11111......
2200820022006.11111......
2201220022006.11111......
320042004....111111111
320082004....111111111
320082004....111111111

Many many thanks in advance for Your time and effort.

Reeza
Super User

The code requires some minor modifications.

1. Change the start year to your start year variable, first_visit

2. Remove the initializing of the variable, so it isn't set to 0 by default.

3. Calculate the end of the loop/indicator by taking the minimum of 2013, the relapse year and death year.

data want;

set have;

array year_indicator(2002:2013) year2002-year2013; *create an array that uses the index 2002 to 2013 and is initialized to 0;

year_end=min(2012, relapse_year, death_year);

do year=first_visit to year_end;

year_indicator(year)=1;

end;

run;

Adam1
Calcite | Level 5

That solved the problem nicely.

What a fantastic way to solve this problem; my data step (4 pages of code...) would not have impressed You.

Thanks again.

Smiley Happy

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 931 views
  • 0 likes
  • 2 in conversation