DATA Step, Macro, Functions and more

Sum variable by relative dates

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 138
Accepted Solution

Sum variable by relative dates

Hi,

I have two datasets. Dataset 1 is at the person level and has ID, a date variable, and a series of variables named for year/month combinations. A value of 0 for a year/month variable means that the person was absent in that year/month; a value of 1 means that they were present:

ID     Date1     Jan2010     Feb2010....etc through Mar2015

1       1/4/12         0                 1                                   0

2       5/9/12         1                 0                                   0

3       4/12/13       1                 1                                   1

Dataset 2 is at the person/date level; that is, each row is one person on one date. Each row also has a number associated with it (number of events):

ID     Date2     Num_events

1      2/3/2014      5

1     10/12/2014   1

1      11/5/2014    2

2       1/1/2012     2

2       5/7/2013     3

I have joined the datasets on ID, and I need two different things:

     1. I need every person's total num_events by month/year combo--however, this needs to be cross-referenced with the year/month      variables, because if they have a value of 1 for that year/month but no events, they should have a value of 0 rather than missing.

          For example, ID 1 was present in Feb 2010 because they have Feb2010=1. In Dataset 2, they have no events in Feb 2010           (because they have no record for Feb 2010). Therefore, ID 1's total events for Feb 2010 should=0. However, they were absent           in Jan 2010 (Jan2010 var=0), so their event total for Jan 2010 will be null, not 0.

    

     2. Even more complicated--ultimately what I want is to have everyone's event total relative to their own Date1 (the date variable in      Dataset 1). That is, the event total for ID 1 in Dec 2011 would be -1 month from their Date1 var, but for ID 2 it would be -5 months      from their Date1 var. 

Basically, I can't figure out how to reconcile all of these elements. I was able to create a dataset of total events by relative month by calculating a new variable which counted the number of months between Date1 and Date2 (mth_diff), then doing

proc sql;

create table mthly_sums as

select ID, mth_diff, sum(num_events) as tot

from have

group by ID, mth_diff;

quit;

But that doesn't incorporate the mth/year vars and the fact that people who were present but had no events should be 0 and not missing.


Accepted Solutions
Solution
‎07-15-2015 07:44 PM
PROC Star
Posts: 1,564

Re: Sum variable by relative dates

This gives the values you are after.

data T1;
 
input ID DATE1 mmddyy10.

        PRESENT_201001 PRESENT_201002 PRESENT201003 PRESENT201004 PRESENT201005 PRESENT201006
        PRESENT_201007 PRESENT_201008 PRESENT201009 PRESENT201010 PRESENT201011 PRESENT201012;
  DATE1=intnx(
'month',DATE1,0);
 
format DATE1 monyy.;
cards ;
1 01/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
2 05/09/2010 0 0 0 0 1 1 1 1 1 1 1 1
3 04/12/2010 1 1 1 1 1 1 1 1 1 1 1 1
4 03/19/2010 1 1 1 1 1 1 1 1 1 0 0 0
5 12/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
run;

data T1A (keep=ID DATE1 DATE2 PRESENT); %* Prepare data for merge;

   set T1;
  
array DATES [12] PRESENT:;
  
do I=1to12;
     DATE2=intnx(
'month','01jan2010'd,I-1);
     PRESENT=DATES;
    
output;
  
end;
  
format DATE2 monyy.;
run;

data T2;
 
input ID    DATE2 : mmddyy.     EVENT_NUM ;
  DATE2=intnx(
'month',DATE2,0);
 
format DATE2 monyy.;
cards;
1      1/9/2010      7
1      1/21/2010     23
1      2/4/2010      2
1      2/7/2010      2
1      4/5/2010      4
1      4/10/2010     1
1      4/21/2010     9
1      5/4/2010      1
1      5/9/2010      7
1      6/12/2010     33
2      5/4/2010      6
2      5/12/2010     74
2      5/30/2010     80
2      6/4/2010      1
2      6/20/2010     52
2      10/12/2010    63
2      10/29/2010    78
2      10/31/2010    33
3      1/24/2010     12
3      1/27/2010     10
3      2/3/2010      3
3      2/17/2010     19
3      7/9/2010      2
3      7/14/2010     55
4      1/23/2010     29
4      1/27/2010     44
4      1/29/2010     35
4      3/5/2010      4
4      3/14/2010     92
4      7/2/2010      4
4      8/9/2010      3
5      1/5/2010      2
5      1/23/2010     49
5      1/24/2010     14
5      9/5/2010      0
5      9/7/2010      11
5      9/12/2010     1
5      12/1/2010     4
5      12/31/2010    14
run;
proc summarydata=T2 nway%* Prepare data for merge;
 
class ID DATE2;
 
var EVENT_NUM;
 
outputout=T2A(keep=ID DATE2 EVENT_NUM) sum=;
run;

data T3; %* Merge;

  merge T1A T2A;
 
by ID DATE2;
  TOTAL= ifn(PRESENT=
1, max(0, EVENT_NUM ), .) ;
run;

View solution in original post


All Replies
PROC Star
Posts: 1,564

Re: Sum variable by relative dates

Give us a small set of data to play with:

data set have1

data set have2

data set want

Respected Advisor
Posts: 3,899

Re: Sum variable by relative dates

To give you an answer here I have to try and test and I don't want to do all the work. So please provide proper data steps creating sample data and then show us how the desired result needs to look like.

Frequent Contributor
Posts: 138

Re: Sum variable by relative dates

Here is the code to create Dataset1 (I used just one year of data):

data dataset1;

input id date1 mmddyy10. present_201001 present_201002 present201003 present201004 present201005 present201006

present_201007 present_201008 present201009 present201010 present201011 present201012;

format date1 mmddyy10.;

cards ;

1 01/07/2010 1 1 1 1 1 1 1 1 1 1 1 1

2 05/09/2010 0 0 0 0 1 1 1 1 1 1 1 1

3 04/12/2010 1 1 1 1 1 1 1 1 1 1 1 1

4 03/19/2010 1 1 1 1 1 1 1 1 1 0 0 0

5 12/07/2010 1 1 1 1 1 1 1 1 1 1 1 1

;

run;

This is the output dataset when you run that code:

iddate1present_201001present_201002present_201003present_201004present_201005present_201006present_201007present_201008present_201009present_201010present_201011present_201012
11/7/2010111111111111
25/9/2010000011111111
34/12/2010111111111111
43/19/2010111111111000
512/7/2010111111111111

Here is the code for Dataset2:

data dataset2;

input id date2 mmddyy10. event_num;

format date2 mmddyy10.;

cards ;

1 1/9/2010 17

1 1/21/2010 23

1 2/4/2010 32

1 2/7/2010 12

1 4/5/2010 14

1 4/10/2010 1

1 4/21/2010 9

1 5/4/2010 91

1 5/9/2010 17

1 6/12/2010 33

2 5/4/2010 56

2 5/12/2010 74

2 5/30/2010 80

2 6/4/2010 11

2 6/20/2010 52

2 10/12/2010 63

2 10/29/2010  78

2 10/31/2010 33

3 1/24/2010 12

3 1/27/2010 10

3 2/3/2010 8

3 2/7/2010 9

3 2/17/2010 19

3 7/9/2010 22

3 7/14/2010 55

4 1/23/2010 29

4 1/27/2010 44

4 1/29/2010 35

4 3/5/2010 74

4 3/14/2010 92

4 7/2/2010 2

4 7/5/2010 13

4 8/9/2010 53

5 1/5/2010 72

5 1/23/2010 49

5 1/24/2010 14

5 9/5/2010 20

5 9/7/2010  11

5 9/12/2010 1

5 12/1/2010 4

5 12/31/2010 14

;

run;

The output dataset:

iddate2event_num
11/9/20107
11/21/201023
12/4/20102
12/7/20102
14/5/20104
14/10/20101
14/21/20109
15/4/20101
15/9/20107
16/12/201033
25/4/20106
25/12/201074
25/30/201080
26/4/20101
26/20/201052
210/12/201063
210/29/201078
210/31/201033
31/24/201012
31/27/201010
32/3/20103
32/17/201019
37/9/20102
37/14/201055
41/23/201029
41/27/201044
41/29/201035
43/5/20104
43/14/201092
47/2/20104
48/9/20103
51/5/20102
51/23/201049
51/24/201014
59/5/20100
59/7/201011
59/12/20101
512/1/20104
512/31/201014

The format of the dataset I want doesn't matter--the sums could be in columns or in rows, but the basic idea is:

     1. Sums of event_num in each month relative to each person's date 1 (ie, -1 month, etc)

     2. If a person does not have any events in a given month, they should have a missing value        for the sum variable for that month UNLESS the present_201001 etc. vars from Dataset1           indicate that they were present in that month (=1), in which case, they should have a value of     0.

So one possible final dataset would be like this:

IDPre_date1_12_mthsPre_date1_11_mthsPre_date1_10_mthsPre_date1_9_mthsPre_date1_8_mthsPre_date1_7_mthsPre_date1_6_mthsPre_date1_5_mthsPre_date1_4_mthsPre_date1_3_mthsPre_date1_2_mthsPre_date1_1_mthsSame_mth_as_Date1Post_date1_1_mthsPost_date1_2_mthsPost_date1_3_mthsPost_date1_4_mthsPost_date1_5_mthsPost_date1_6_mthsPost_date1_7_mthsPost_date1_8_mthsPost_date1_9_mthsPost_date1_10_mthsPost_date1_11_mthsPost_date1_12_mths
1............304014833000000.
2............1605300017400.....
3.........222200005700000....
4..........108096000430......
56500000000120018............

But it could also be the same numbers structured in a different way, like this:

ID     Relative_mth     total

1       -12                        .

etc.

Solution
‎07-15-2015 07:44 PM
PROC Star
Posts: 1,564

Re: Sum variable by relative dates

This gives the values you are after.

data T1;
 
input ID DATE1 mmddyy10.

        PRESENT_201001 PRESENT_201002 PRESENT201003 PRESENT201004 PRESENT201005 PRESENT201006
        PRESENT_201007 PRESENT_201008 PRESENT201009 PRESENT201010 PRESENT201011 PRESENT201012;
  DATE1=intnx(
'month',DATE1,0);
 
format DATE1 monyy.;
cards ;
1 01/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
2 05/09/2010 0 0 0 0 1 1 1 1 1 1 1 1
3 04/12/2010 1 1 1 1 1 1 1 1 1 1 1 1
4 03/19/2010 1 1 1 1 1 1 1 1 1 0 0 0
5 12/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
run;

data T1A (keep=ID DATE1 DATE2 PRESENT); %* Prepare data for merge;

   set T1;
  
array DATES [12] PRESENT:;
  
do I=1to12;
     DATE2=intnx(
'month','01jan2010'd,I-1);
     PRESENT=DATES;
    
output;
  
end;
  
format DATE2 monyy.;
run;

data T2;
 
input ID    DATE2 : mmddyy.     EVENT_NUM ;
  DATE2=intnx(
'month',DATE2,0);
 
format DATE2 monyy.;
cards;
1      1/9/2010      7
1      1/21/2010     23
1      2/4/2010      2
1      2/7/2010      2
1      4/5/2010      4
1      4/10/2010     1
1      4/21/2010     9
1      5/4/2010      1
1      5/9/2010      7
1      6/12/2010     33
2      5/4/2010      6
2      5/12/2010     74
2      5/30/2010     80
2      6/4/2010      1
2      6/20/2010     52
2      10/12/2010    63
2      10/29/2010    78
2      10/31/2010    33
3      1/24/2010     12
3      1/27/2010     10
3      2/3/2010      3
3      2/17/2010     19
3      7/9/2010      2
3      7/14/2010     55
4      1/23/2010     29
4      1/27/2010     44
4      1/29/2010     35
4      3/5/2010      4
4      3/14/2010     92
4      7/2/2010      4
4      8/9/2010      3
5      1/5/2010      2
5      1/23/2010     49
5      1/24/2010     14
5      9/5/2010      0
5      9/7/2010      11
5      9/12/2010     1
5      12/1/2010     4
5      12/31/2010    14
run;
proc summarydata=T2 nway%* Prepare data for merge;
 
class ID DATE2;
 
var EVENT_NUM;
 
outputout=T2A(keep=ID DATE2 EVENT_NUM) sum=;
run;

data T3; %* Merge;

  merge T1A T2A;
 
by ID DATE2;
  TOTAL= ifn(PRESENT=
1, max(0, EVENT_NUM ), .) ;
run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 289 views
  • 0 likes
  • 3 in conversation