## Creating a risk variable, Network Analysis

Solved
Regular Contributor
Posts: 150

# Creating a risk variable, Network Analysis

I asked a similar question the other day but now I need something a little different. I am going to conduct a social network analysis and here is what I have, the data below I just made up as an example:

ID    School   Startdate     Enddate    disease_positive

1          1          39167          39923          1

2          1          39167          39923          0

3          2          39167          39923          0

4          1          39109          39211          0

We are trying to determine if being at the same school with someone who is disease positive increases your susceptibility to disease. It is somewhat complicated to explain, I am attaching a figure and a little explanation in a figure in powerpoint that I hope will help. Basically what we need to calculate the number of days that a participant is risk for disease, so the number of days that they are at the same school as a person who has disease. Participants can be moving in and out of school and their risk will change. The infectious period is for 1 year after a participant dies from disease, due to environmental exposure. The number of days at risk should increase the more days a participant who is not infected is housed with a participant who is infected and if the infected participant dies then the risk stabilizes and stay constant for a 1 year period, then at 1 year drops down to no risk over the number of days the participant was originally at risk. So if two participants were housed together for 20 days before the infected partipicipant dies, on day 1 risk=1, day 2 risk=2, etc... then on day 20 when the infected participant dies, the number of days at risk stays 20 for 1 year, then at 1 year, decreases for 20 days back to 0. If they move to a new school and encounter an infected participant, or an infected participant moves into their school then their risk increases from their last risk so using the last example, if it was 20, then the first day they are in the same school as the new person who is infected their risk=21 and so on.

I do not know how to convert the data to do this so any help would be greatly appreciated! Please let me know if you need additional explanation, I know this is confusing and it took me a while to wrap my head around.

Accepted Solutions
Solution
‎05-03-2013 09:22 AM
Super Contributor
Posts: 644

## Re: Creating a risk variable, Network Analysis

I suggest the first step would be to consider the school to be "infected" on a particular date if one or more participants in the school are infected.  So separate out the infected participants in your study and group them by school.  In the data you have given school 1 is "infected" between 27 March 2011 and 21 April 2013.

Data infected

at_risk ;

Input ID School  Startdate     Enddate    disease_positive ;

Startdate = Startdate - 20454 ;

Enddate  = Enddate - 20454 ; /* correction factor Excel to SAS dates */

Format Startdate Enddate DATE9. ;

If  disease_positive = 1 then output infected ;

Else output at_risk ;

Datalines ;

1          1          39167          39923          1

2          1          39167          39923          0

3          2          39167          39923          0

4          1          39109          39211          0

;

Now create for each infected school one record per day infected:

Data school_risk_dates ;

Set infected ;

Format risk_date DATE. ;

Do Startdate to Enddate ;

Output ;

End ;

Keep school risk_date ;

Run ;

At this point you might like to consider doubling the risk if two infected participants are in the school at the same time ;

Proc SQL ;

Create table school_risk as

Select distinct school

, risk_date

, count(*)  as risk     /* replace with   , 1 as risk   if you don't like my idea  */

From school_risk_dates

Group by                    /* Order by */

school

, risk_date

;

Quit ;

Now repeat the process for the at_risk participants, adding ID as well as school, to generate a student_in_school table for each day (school_date) each student was in any school.

You need to extend each student's timeline to the end of your study timeframe - more than 1 year past the date of last risk exposure.  Say 2 years to be safe:  up to 30 April 2015.

Data calendar ;

Format date DATE9. ;

Do Date = '01JAN2011'D to '30APR2015'D ;

output ;

end ;

run ;

Proc SQL ;

Create table student_calendar as

Select ID

, date

From  student_in_school

, calendar

Order by

ID

, date

;

/* Now these tables can be joined */

Proc SQL ;

Create table student_at_risk as

Select distinct

std.ID

, std.school

, cal.date

, rsk.risk

from student_in_school  std

,  school_risk            rsk

right join

student_calendar   cal

On std.school_date = cal.date

and std.Id = cal.ID

Where std.school = rsk.school

and std.school_date = rsk.risk_date

Order by

std.ID

, cal.date

;

Quit ;

Now we can "aggregate" the risk per student :

Data student_risk ;

Set student_at_risk ;

By     ID

date

;

Retain     exposure_risk

last_exposure

;

Format     last_exposure      DATE9. ;

prev_risk = lag(risk) ;

If      first.ID      then

Do ;

exposure_risk = 0 ;

last_exposure = . ;

prev_risk = 0 ;

End ;

if   last_exposure = .

and risk = 0

then

Do ;

output ;

return ;

end ;

If   risk > 0 then

Do ;

exposure_risk + risk ;

output ;

End ;

Else

Do ;

If  prev_risk > 0 then last_exposure = date - 1 ;

If  date > (last_exposure + 365) then exposure_risk + (-1) ;

exposure_risk = max (exposure_risk, 0) ;

output ;

End ;

Drop     prev_risk ;

Run ;

Warning: Untested code!

Richard

All Replies
Solution
‎05-03-2013 09:22 AM
Super Contributor
Posts: 644

## Re: Creating a risk variable, Network Analysis

I suggest the first step would be to consider the school to be "infected" on a particular date if one or more participants in the school are infected.  So separate out the infected participants in your study and group them by school.  In the data you have given school 1 is "infected" between 27 March 2011 and 21 April 2013.

Data infected

at_risk ;

Input ID School  Startdate     Enddate    disease_positive ;

Startdate = Startdate - 20454 ;

Enddate  = Enddate - 20454 ; /* correction factor Excel to SAS dates */

Format Startdate Enddate DATE9. ;

If  disease_positive = 1 then output infected ;

Else output at_risk ;

Datalines ;

1          1          39167          39923          1

2          1          39167          39923          0

3          2          39167          39923          0

4          1          39109          39211          0

;

Now create for each infected school one record per day infected:

Data school_risk_dates ;

Set infected ;

Format risk_date DATE. ;

Do Startdate to Enddate ;

Output ;

End ;

Keep school risk_date ;

Run ;

At this point you might like to consider doubling the risk if two infected participants are in the school at the same time ;

Proc SQL ;

Create table school_risk as

Select distinct school

, risk_date

, count(*)  as risk     /* replace with   , 1 as risk   if you don't like my idea  */

From school_risk_dates

Group by                    /* Order by */

school

, risk_date

;

Quit ;

Now repeat the process for the at_risk participants, adding ID as well as school, to generate a student_in_school table for each day (school_date) each student was in any school.

You need to extend each student's timeline to the end of your study timeframe - more than 1 year past the date of last risk exposure.  Say 2 years to be safe:  up to 30 April 2015.

Data calendar ;

Format date DATE9. ;

Do Date = '01JAN2011'D to '30APR2015'D ;

output ;

end ;

run ;

Proc SQL ;

Create table student_calendar as

Select ID

, date

From  student_in_school

, calendar

Order by

ID

, date

;

/* Now these tables can be joined */

Proc SQL ;

Create table student_at_risk as

Select distinct

std.ID

, std.school

, cal.date

, rsk.risk

from student_in_school  std

,  school_risk            rsk

right join

student_calendar   cal

On std.school_date = cal.date

and std.Id = cal.ID

Where std.school = rsk.school

and std.school_date = rsk.risk_date

Order by

std.ID

, cal.date

;

Quit ;

Now we can "aggregate" the risk per student :

Data student_risk ;

Set student_at_risk ;

By     ID

date

;

Retain     exposure_risk

last_exposure

;

Format     last_exposure      DATE9. ;

prev_risk = lag(risk) ;

If      first.ID      then

Do ;

exposure_risk = 0 ;

last_exposure = . ;

prev_risk = 0 ;

End ;

if   last_exposure = .

and risk = 0

then

Do ;

output ;

return ;

end ;

If   risk > 0 then

Do ;

exposure_risk + risk ;

output ;

End ;

Else

Do ;

If  prev_risk > 0 then last_exposure = date - 1 ;

If  date > (last_exposure + 365) then exposure_risk + (-1) ;

exposure_risk = max (exposure_risk, 0) ;

output ;

End ;

Drop     prev_risk ;

Run ;

Warning: Untested code!

Richard

Super Contributor
Posts: 644

## Re: Creating a risk variable, Network Analysis

By the way, my code assumes that an infected student is not completely cured during the test interval.

Richard

Regular Contributor
Posts: 150