Creating a risk variable, Network Analysis

Accepted Solution Solved
Reply
Regular Contributor
Posts: 150
Accepted Solution

Creating a risk variable, Network Analysis

I asked a similar question the other day but now I need something a little different. I am going to conduct a social network analysis and here is what I have, the data below I just made up as an example:

ID    School   Startdate     Enddate    disease_positive

1          1          39167          39923          1

2          1          39167          39923          0

3          2          39167          39923          0

4          1          39109          39211          0

We are trying to determine if being at the same school with someone who is disease positive increases your susceptibility to disease. It is somewhat complicated to explain, I am attaching a figure and a little explanation in a figure in powerpoint that I hope will help. Basically what we need to calculate the number of days that a participant is risk for disease, so the number of days that they are at the same school as a person who has disease. Participants can be moving in and out of school and their risk will change. The infectious period is for 1 year after a participant dies from disease, due to environmental exposure. The number of days at risk should increase the more days a participant who is not infected is housed with a participant who is infected and if the infected participant dies then the risk stabilizes and stay constant for a 1 year period, then at 1 year drops down to no risk over the number of days the participant was originally at risk. So if two participants were housed together for 20 days before the infected partipicipant dies, on day 1 risk=1, day 2 risk=2, etc... then on day 20 when the infected participant dies, the number of days at risk stays 20 for 1 year, then at 1 year, decreases for 20 days back to 0. If they move to a new school and encounter an infected participant, or an infected participant moves into their school then their risk increases from their last risk so using the last example, if it was 20, then the first day they are in the same school as the new person who is infected their risk=21 and so on.

I do not know how to convert the data to do this so any help would be greatly appreciated! Please let me know if you need additional explanation, I know this is confusing and it took me a while to wrap my head around.


Accepted Solutions
Solution
‎05-03-2013 09:22 AM
Super Contributor
Posts: 644

Re: Creating a risk variable, Network Analysis

Posted in reply to rfarmenta

I suggest the first step would be to consider the school to be "infected" on a particular date if one or more participants in the school are infected.  So separate out the infected participants in your study and group them by school.  In the data you have given school 1 is "infected" between 27 March 2011 and 21 April 2013.

Data infected

        at_risk ;

     Input ID School  Startdate     Enddate    disease_positive ;

     Startdate = Startdate - 20454 ; 

     Enddate  = Enddate - 20454 ; /* correction factor Excel to SAS dates */

     Format Startdate Enddate DATE9. ;

     If  disease_positive = 1 then output infected ;

     Else output at_risk ;

Datalines ;

1          1          39167          39923          1

2          1          39167          39923          0

3          2          39167          39923          0

4          1          39109          39211          0

;

Now create for each infected school one record per day infected:

Data school_risk_dates ;

     Set infected ;

     Format risk_date DATE. ;

     Do Startdate to Enddate ;

          Output ;

     End ;

     Keep school risk_date ;

Run ;

At this point you might like to consider doubling the risk if two infected participants are in the school at the same time ;

Proc SQL ;

     Create table school_risk as

          Select distinct school

               , risk_date

               , count(*)  as risk     /* replace with   , 1 as risk   if you don't like my idea  */

          From school_risk_dates

          Group by                    /* Order by */

                 school

               , risk_date

      ;

Quit ;

Now repeat the process for the at_risk participants, adding ID as well as school, to generate a student_in_school table for each day (school_date) each student was in any school.

You need to extend each student's timeline to the end of your study timeframe - more than 1 year past the date of last risk exposure.  Say 2 years to be safe:  up to 30 April 2015.

Data calendar ;

     Format date DATE9. ;

     Do Date = '01JAN2011'D to '30APR2015'D ;

          output ;

     end ;

run ;

Proc SQL ;

     Create table student_calendar as

          Select ID

               , date

          From  student_in_school

               , calendar

          Order by

                 ID

               , date

     ;


/* Now these tables can be joined */

Proc SQL ;

     Create table student_at_risk as

          Select distinct

                 std.ID

               , std.school

               , cal.date

               , rsk.risk

          from student_in_school  std

               ,  school_risk            rsk

                    right join

                  student_calendar   cal

               On std.school_date = cal.date

               and std.Id = cal.ID

          Where std.school = rsk.school

               and std.school_date = rsk.risk_date

          Order by

                 std.ID

               , cal.date

         ;

Quit ;

    

Now we can "aggregate" the risk per student :

Data student_risk ;

     Set student_at_risk ;

     By     ID

               date

               ;

     Retain     exposure_risk

                    last_exposure

                    ;

     Format     last_exposure      DATE9. ;

     prev_risk = lag(risk) ;

     If      first.ID      then

          Do ;

                    exposure_risk = 0 ;

                    last_exposure = . ;

                    prev_risk = 0 ;

          End ;

     if   last_exposure = .

           and risk = 0

        then

           Do ;

               output ;

               return ;

          end ;

     If   risk > 0 then  

          Do ;

               exposure_risk + risk ;

               output ;

          End ;

     Else

          Do ;

               If  prev_risk > 0 then last_exposure = date - 1 ;

               If  date > (last_exposure + 365) then exposure_risk + (-1) ;

               exposure_risk = max (exposure_risk, 0) ;

               output ;

          End ;

     Drop     prev_risk ;

Run ;

Warning: Untested code!

Richard

View solution in original post


All Replies
Solution
‎05-03-2013 09:22 AM
Super Contributor
Posts: 644

Re: Creating a risk variable, Network Analysis

Posted in reply to rfarmenta

I suggest the first step would be to consider the school to be "infected" on a particular date if one or more participants in the school are infected.  So separate out the infected participants in your study and group them by school.  In the data you have given school 1 is "infected" between 27 March 2011 and 21 April 2013.

Data infected

        at_risk ;

     Input ID School  Startdate     Enddate    disease_positive ;

     Startdate = Startdate - 20454 ; 

     Enddate  = Enddate - 20454 ; /* correction factor Excel to SAS dates */

     Format Startdate Enddate DATE9. ;

     If  disease_positive = 1 then output infected ;

     Else output at_risk ;

Datalines ;

1          1          39167          39923          1

2          1          39167          39923          0

3          2          39167          39923          0

4          1          39109          39211          0

;

Now create for each infected school one record per day infected:

Data school_risk_dates ;

     Set infected ;

     Format risk_date DATE. ;

     Do Startdate to Enddate ;

          Output ;

     End ;

     Keep school risk_date ;

Run ;

At this point you might like to consider doubling the risk if two infected participants are in the school at the same time ;

Proc SQL ;

     Create table school_risk as

          Select distinct school

               , risk_date

               , count(*)  as risk     /* replace with   , 1 as risk   if you don't like my idea  */

          From school_risk_dates

          Group by                    /* Order by */

                 school

               , risk_date

      ;

Quit ;

Now repeat the process for the at_risk participants, adding ID as well as school, to generate a student_in_school table for each day (school_date) each student was in any school.

You need to extend each student's timeline to the end of your study timeframe - more than 1 year past the date of last risk exposure.  Say 2 years to be safe:  up to 30 April 2015.

Data calendar ;

     Format date DATE9. ;

     Do Date = '01JAN2011'D to '30APR2015'D ;

          output ;

     end ;

run ;

Proc SQL ;

     Create table student_calendar as

          Select ID

               , date

          From  student_in_school

               , calendar

          Order by

                 ID

               , date

     ;


/* Now these tables can be joined */

Proc SQL ;

     Create table student_at_risk as

          Select distinct

                 std.ID

               , std.school

               , cal.date

               , rsk.risk

          from student_in_school  std

               ,  school_risk            rsk

                    right join

                  student_calendar   cal

               On std.school_date = cal.date

               and std.Id = cal.ID

          Where std.school = rsk.school

               and std.school_date = rsk.risk_date

          Order by

                 std.ID

               , cal.date

         ;

Quit ;

    

Now we can "aggregate" the risk per student :

Data student_risk ;

     Set student_at_risk ;

     By     ID

               date

               ;

     Retain     exposure_risk

                    last_exposure

                    ;

     Format     last_exposure      DATE9. ;

     prev_risk = lag(risk) ;

     If      first.ID      then

          Do ;

                    exposure_risk = 0 ;

                    last_exposure = . ;

                    prev_risk = 0 ;

          End ;

     if   last_exposure = .

           and risk = 0

        then

           Do ;

               output ;

               return ;

          end ;

     If   risk > 0 then  

          Do ;

               exposure_risk + risk ;

               output ;

          End ;

     Else

          Do ;

               If  prev_risk > 0 then last_exposure = date - 1 ;

               If  date > (last_exposure + 365) then exposure_risk + (-1) ;

               exposure_risk = max (exposure_risk, 0) ;

               output ;

          End ;

     Drop     prev_risk ;

Run ;

Warning: Untested code!

Richard

Super Contributor
Posts: 644

Re: Creating a risk variable, Network Analysis

Posted in reply to RichardinOz

By the way, my code assumes that an infected student is not completely cured during the test interval.

Richard

Regular Contributor
Posts: 150

Re: Creating a risk variable, Network Analysis

Posted in reply to RichardinOz

Thank you! Sorry for the delayed response. I got busy working on another project and just got a chance to look at this. With a few modifications I was able to get this to do what I needed!

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 246 views
  • 3 likes
  • 2 in conversation