Obsidian | Level 7

## Sum variable by relative dates

Hi,

I have two datasets. Dataset 1 is at the person level and has ID, a date variable, and a series of variables named for year/month combinations. A value of 0 for a year/month variable means that the person was absent in that year/month; a value of 1 means that they were present:

ID     Date1     Jan2010     Feb2010....etc through Mar2015

1       1/4/12         0                 1                                   0

2       5/9/12         1                 0                                   0

3       4/12/13       1                 1                                   1

Dataset 2 is at the person/date level; that is, each row is one person on one date. Each row also has a number associated with it (number of events):

ID     Date2     Num_events

1      2/3/2014      5

1     10/12/2014   1

1      11/5/2014    2

2       1/1/2012     2

2       5/7/2013     3

I have joined the datasets on ID, and I need two different things:

1. I need every person's total num_events by month/year combo--however, this needs to be cross-referenced with the year/month      variables, because if they have a value of 1 for that year/month but no events, they should have a value of 0 rather than missing.

For example, ID 1 was present in Feb 2010 because they have Feb2010=1. In Dataset 2, they have no events in Feb 2010           (because they have no record for Feb 2010). Therefore, ID 1's total events for Feb 2010 should=0. However, they were absent           in Jan 2010 (Jan2010 var=0), so their event total for Jan 2010 will be null, not 0.

2. Even more complicated--ultimately what I want is to have everyone's event total relative to their own Date1 (the date variable in      Dataset 1). That is, the event total for ID 1 in Dec 2011 would be -1 month from their Date1 var, but for ID 2 it would be -5 months      from their Date1 var.

Basically, I can't figure out how to reconcile all of these elements. I was able to create a dataset of total events by relative month by calculating a new variable which counted the number of months between Date1 and Date2 (mth_diff), then doing

proc sql;

create table mthly_sums as

select ID, mth_diff, sum(num_events) as tot

from have

group by ID, mth_diff;

quit;

But that doesn't incorporate the mth/year vars and the fact that people who were present but had no events should be 0 and not missing.

1 ACCEPTED SOLUTION

Accepted Solutions
Tourmaline | Level 20

## Re: Sum variable by relative dates

This gives the values you are after.

data T1;

input ID DATE1 mmddyy10.

PRESENT_201001 PRESENT_201002 PRESENT201003 PRESENT201004 PRESENT201005 PRESENT201006
PRESENT_201007 PRESENT_201008 PRESENT201009 PRESENT201010 PRESENT201011 PRESENT201012;
DATE1=intnx(
'month',DATE1,0);

format DATE1 monyy.;
cards ;
1 01/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
2 05/09/2010 0 0 0 0 1 1 1 1 1 1 1 1
3 04/12/2010 1 1 1 1 1 1 1 1 1 1 1 1
4 03/19/2010 1 1 1 1 1 1 1 1 1 0 0 0
5 12/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
run;

data T1A (keep=ID DATE1 DATE2 PRESENT); %* Prepare data for merge;

set T1;

array DATES [12] PRESENT:;

do I=1to12;
DATE2=intnx(
'month','01jan2010'd,I-1);
PRESENT=DATES;

output;

end;

format DATE2 monyy.;
run;

data T2;

input ID    DATE2 : mmddyy.     EVENT_NUM ;
DATE2=intnx(
'month',DATE2,0);

format DATE2 monyy.;
cards;
1      1/9/2010      7
1      1/21/2010     23
1      2/4/2010      2
1      2/7/2010      2
1      4/5/2010      4
1      4/10/2010     1
1      4/21/2010     9
1      5/4/2010      1
1      5/9/2010      7
1      6/12/2010     33
2      5/4/2010      6
2      5/12/2010     74
2      5/30/2010     80
2      6/4/2010      1
2      6/20/2010     52
2      10/12/2010    63
2      10/29/2010    78
2      10/31/2010    33
3      1/24/2010     12
3      1/27/2010     10
3      2/3/2010      3
3      2/17/2010     19
3      7/9/2010      2
3      7/14/2010     55
4      1/23/2010     29
4      1/27/2010     44
4      1/29/2010     35
4      3/5/2010      4
4      3/14/2010     92
4      7/2/2010      4
4      8/9/2010      3
5      1/5/2010      2
5      1/23/2010     49
5      1/24/2010     14
5      9/5/2010      0
5      9/7/2010      11
5      9/12/2010     1
5      12/1/2010     4
5      12/31/2010    14
run;
proc summarydata=T2 nway%* Prepare data for merge;

class ID DATE2;

var EVENT_NUM;

outputout=T2A(keep=ID DATE2 EVENT_NUM) sum=;
run;

data T3; %* Merge;

merge T1A T2A;

by ID DATE2;
TOTAL= ifn(PRESENT=
1, max(0, EVENT_NUM ), .) ;
run;

4 REPLIES 4
Tourmaline | Level 20

## Re: Sum variable by relative dates

Give us a small set of data to play with:

data set have1

data set have2

data set want

Opal | Level 21

## Re: Sum variable by relative dates

To give you an answer here I have to try and test and I don't want to do all the work. So please provide proper data steps creating sample data and then show us how the desired result needs to look like.

Obsidian | Level 7

## Re: Sum variable by relative dates

Here is the code to create Dataset1 (I used just one year of data):

data dataset1;

input id date1 mmddyy10. present_201001 present_201002 present201003 present201004 present201005 present201006

present_201007 present_201008 present201009 present201010 present201011 present201012;

format date1 mmddyy10.;

cards ;

1 01/07/2010 1 1 1 1 1 1 1 1 1 1 1 1

2 05/09/2010 0 0 0 0 1 1 1 1 1 1 1 1

3 04/12/2010 1 1 1 1 1 1 1 1 1 1 1 1

4 03/19/2010 1 1 1 1 1 1 1 1 1 0 0 0

5 12/07/2010 1 1 1 1 1 1 1 1 1 1 1 1

;

run;

This is the output dataset when you run that code:

 id date1 present_201001 present_201002 present_201003 present_201004 present_201005 present_201006 present_201007 present_201008 present_201009 present_201010 present_201011 present_201012 1 1/7/2010 1 1 1 1 1 1 1 1 1 1 1 1 2 5/9/2010 0 0 0 0 1 1 1 1 1 1 1 1 3 4/12/2010 1 1 1 1 1 1 1 1 1 1 1 1 4 3/19/2010 1 1 1 1 1 1 1 1 1 0 0 0 5 12/7/2010 1 1 1 1 1 1 1 1 1 1 1 1

Here is the code for Dataset2:

data dataset2;

input id date2 mmddyy10. event_num;

format date2 mmddyy10.;

cards ;

1 1/9/2010 17

1 1/21/2010 23

1 2/4/2010 32

1 2/7/2010 12

1 4/5/2010 14

1 4/10/2010 1

1 4/21/2010 9

1 5/4/2010 91

1 5/9/2010 17

1 6/12/2010 33

2 5/4/2010 56

2 5/12/2010 74

2 5/30/2010 80

2 6/4/2010 11

2 6/20/2010 52

2 10/12/2010 63

2 10/29/2010  78

2 10/31/2010 33

3 1/24/2010 12

3 1/27/2010 10

3 2/3/2010 8

3 2/7/2010 9

3 2/17/2010 19

3 7/9/2010 22

3 7/14/2010 55

4 1/23/2010 29

4 1/27/2010 44

4 1/29/2010 35

4 3/5/2010 74

4 3/14/2010 92

4 7/2/2010 2

4 7/5/2010 13

4 8/9/2010 53

5 1/5/2010 72

5 1/23/2010 49

5 1/24/2010 14

5 9/5/2010 20

5 9/7/2010  11

5 9/12/2010 1

5 12/1/2010 4

5 12/31/2010 14

;

run;

The output dataset:

 id date2 event_num 1 1/9/2010 7 1 1/21/2010 23 1 2/4/2010 2 1 2/7/2010 2 1 4/5/2010 4 1 4/10/2010 1 1 4/21/2010 9 1 5/4/2010 1 1 5/9/2010 7 1 6/12/2010 33 2 5/4/2010 6 2 5/12/2010 74 2 5/30/2010 80 2 6/4/2010 1 2 6/20/2010 52 2 10/12/2010 63 2 10/29/2010 78 2 10/31/2010 33 3 1/24/2010 12 3 1/27/2010 10 3 2/3/2010 3 3 2/17/2010 19 3 7/9/2010 2 3 7/14/2010 55 4 1/23/2010 29 4 1/27/2010 44 4 1/29/2010 35 4 3/5/2010 4 4 3/14/2010 92 4 7/2/2010 4 4 8/9/2010 3 5 1/5/2010 2 5 1/23/2010 49 5 1/24/2010 14 5 9/5/2010 0 5 9/7/2010 11 5 9/12/2010 1 5 12/1/2010 4 5 12/31/2010 14

The format of the dataset I want doesn't matter--the sums could be in columns or in rows, but the basic idea is:

1. Sums of event_num in each month relative to each person's date 1 (ie, -1 month, etc)

2. If a person does not have any events in a given month, they should have a missing value        for the sum variable for that month UNLESS the present_201001 etc. vars from Dataset1           indicate that they were present in that month (=1), in which case, they should have a value of     0.

So one possible final dataset would be like this:

 ID Pre_date1_12_mths Pre_date1_11_mths Pre_date1_10_mths Pre_date1_9_mths Pre_date1_8_mths Pre_date1_7_mths Pre_date1_6_mths Pre_date1_5_mths Pre_date1_4_mths Pre_date1_3_mths Pre_date1_2_mths Pre_date1_1_mths Same_mth_as_Date1 Post_date1_1_mths Post_date1_2_mths Post_date1_3_mths Post_date1_4_mths Post_date1_5_mths Post_date1_6_mths Post_date1_7_mths Post_date1_8_mths Post_date1_9_mths Post_date1_10_mths Post_date1_11_mths Post_date1_12_mths 1 . . . . . . . . . . . . 30 4 0 14 8 33 0 0 0 0 0 0 . 2 . . . . . . . . . . . . 160 53 0 0 0 174 0 0 . . . . . 3 . . . . . . . . . 22 22 0 0 0 0 57 0 0 0 0 0 . . . . 4 . . . . . . . . . . 108 0 96 0 0 0 4 3 0 . . . . . . 5 65 0 0 0 0 0 0 0 0 12 0 0 18 . . . . . . . . . . . .

But it could also be the same numbers structured in a different way, like this:

ID     Relative_mth     total

1       -12                        .

etc.

Tourmaline | Level 20

## Re: Sum variable by relative dates

This gives the values you are after.

data T1;

input ID DATE1 mmddyy10.

PRESENT_201001 PRESENT_201002 PRESENT201003 PRESENT201004 PRESENT201005 PRESENT201006
PRESENT_201007 PRESENT_201008 PRESENT201009 PRESENT201010 PRESENT201011 PRESENT201012;
DATE1=intnx(
'month',DATE1,0);

format DATE1 monyy.;
cards ;
1 01/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
2 05/09/2010 0 0 0 0 1 1 1 1 1 1 1 1
3 04/12/2010 1 1 1 1 1 1 1 1 1 1 1 1
4 03/19/2010 1 1 1 1 1 1 1 1 1 0 0 0
5 12/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
run;

data T1A (keep=ID DATE1 DATE2 PRESENT); %* Prepare data for merge;

set T1;

array DATES [12] PRESENT:;

do I=1to12;
DATE2=intnx(
'month','01jan2010'd,I-1);
PRESENT=DATES;

output;

end;

format DATE2 monyy.;
run;

data T2;

input ID    DATE2 : mmddyy.     EVENT_NUM ;
DATE2=intnx(
'month',DATE2,0);

format DATE2 monyy.;
cards;
1      1/9/2010      7
1      1/21/2010     23
1      2/4/2010      2
1      2/7/2010      2
1      4/5/2010      4
1      4/10/2010     1
1      4/21/2010     9
1      5/4/2010      1
1      5/9/2010      7
1      6/12/2010     33
2      5/4/2010      6
2      5/12/2010     74
2      5/30/2010     80
2      6/4/2010      1
2      6/20/2010     52
2      10/12/2010    63
2      10/29/2010    78
2      10/31/2010    33
3      1/24/2010     12
3      1/27/2010     10
3      2/3/2010      3
3      2/17/2010     19
3      7/9/2010      2
3      7/14/2010     55
4      1/23/2010     29
4      1/27/2010     44
4      1/29/2010     35
4      3/5/2010      4
4      3/14/2010     92
4      7/2/2010      4
4      8/9/2010      3
5      1/5/2010      2
5      1/23/2010     49
5      1/24/2010     14
5      9/5/2010      0
5      9/7/2010      11
5      9/12/2010     1
5      12/1/2010     4
5      12/31/2010    14
run;
proc summarydata=T2 nway%* Prepare data for merge;

class ID DATE2;

var EVENT_NUM;

outputout=T2A(keep=ID DATE2 EVENT_NUM) sum=;
run;

data T3; %* Merge;

merge T1A T2A;

by ID DATE2;
TOTAL= ifn(PRESENT=
1, max(0, EVENT_NUM ), .) ;
run;

Discussion stats
• 4 replies
• 1237 views
• 0 likes
• 3 in conversation