BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Walternate
Obsidian | Level 7

Hi,

I have two datasets. Dataset 1 is at the person level and has ID, a date variable, and a series of variables named for year/month combinations. A value of 0 for a year/month variable means that the person was absent in that year/month; a value of 1 means that they were present:

ID     Date1     Jan2010     Feb2010....etc through Mar2015

1       1/4/12         0                 1                                   0

2       5/9/12         1                 0                                   0

3       4/12/13       1                 1                                   1

Dataset 2 is at the person/date level; that is, each row is one person on one date. Each row also has a number associated with it (number of events):

ID     Date2     Num_events

1      2/3/2014      5

1     10/12/2014   1

1      11/5/2014    2

2       1/1/2012     2

2       5/7/2013     3

I have joined the datasets on ID, and I need two different things:

     1. I need every person's total num_events by month/year combo--however, this needs to be cross-referenced with the year/month      variables, because if they have a value of 1 for that year/month but no events, they should have a value of 0 rather than missing.

          For example, ID 1 was present in Feb 2010 because they have Feb2010=1. In Dataset 2, they have no events in Feb 2010           (because they have no record for Feb 2010). Therefore, ID 1's total events for Feb 2010 should=0. However, they were absent           in Jan 2010 (Jan2010 var=0), so their event total for Jan 2010 will be null, not 0.

    

     2. Even more complicated--ultimately what I want is to have everyone's event total relative to their own Date1 (the date variable in      Dataset 1). That is, the event total for ID 1 in Dec 2011 would be -1 month from their Date1 var, but for ID 2 it would be -5 months      from their Date1 var. 

Basically, I can't figure out how to reconcile all of these elements. I was able to create a dataset of total events by relative month by calculating a new variable which counted the number of months between Date1 and Date2 (mth_diff), then doing

proc sql;

create table mthly_sums as

select ID, mth_diff, sum(num_events) as tot

from have

group by ID, mth_diff;

quit;

But that doesn't incorporate the mth/year vars and the fact that people who were present but had no events should be 0 and not missing.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

This gives the values you are after.

data T1;
 
input ID DATE1 mmddyy10.

        PRESENT_201001 PRESENT_201002 PRESENT201003 PRESENT201004 PRESENT201005 PRESENT201006
        PRESENT_201007 PRESENT_201008 PRESENT201009 PRESENT201010 PRESENT201011 PRESENT201012;
  DATE1=intnx(
'month',DATE1,0);
 
format DATE1 monyy.;
cards ;
1 01/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
2 05/09/2010 0 0 0 0 1 1 1 1 1 1 1 1
3 04/12/2010 1 1 1 1 1 1 1 1 1 1 1 1
4 03/19/2010 1 1 1 1 1 1 1 1 1 0 0 0
5 12/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
run;

data T1A (keep=ID DATE1 DATE2 PRESENT); %* Prepare data for merge;

   set T1;
  
array DATES [12] PRESENT:;
  
do I=1to12;
     DATE2=intnx(
'month','01jan2010'd,I-1);
     PRESENT=DATES;
    
output;
  
end;
  
format DATE2 monyy.;
run;

data T2;
 
input ID    DATE2 : mmddyy.     EVENT_NUM ;
  DATE2=intnx(
'month',DATE2,0);
 
format DATE2 monyy.;
cards;
1      1/9/2010      7
1      1/21/2010     23
1      2/4/2010      2
1      2/7/2010      2
1      4/5/2010      4
1      4/10/2010     1
1      4/21/2010     9
1      5/4/2010      1
1      5/9/2010      7
1      6/12/2010     33
2      5/4/2010      6
2      5/12/2010     74
2      5/30/2010     80
2      6/4/2010      1
2      6/20/2010     52
2      10/12/2010    63
2      10/29/2010    78
2      10/31/2010    33
3      1/24/2010     12
3      1/27/2010     10
3      2/3/2010      3
3      2/17/2010     19
3      7/9/2010      2
3      7/14/2010     55
4      1/23/2010     29
4      1/27/2010     44
4      1/29/2010     35
4      3/5/2010      4
4      3/14/2010     92
4      7/2/2010      4
4      8/9/2010      3
5      1/5/2010      2
5      1/23/2010     49
5      1/24/2010     14
5      9/5/2010      0
5      9/7/2010      11
5      9/12/2010     1
5      12/1/2010     4
5      12/31/2010    14
run;
proc summarydata=T2 nway%* Prepare data for merge;
 
class ID DATE2;
 
var EVENT_NUM;
 
outputout=T2A(keep=ID DATE2 EVENT_NUM) sum=;
run;

data T3; %* Merge;

  merge T1A T2A;
 
by ID DATE2;
  TOTAL= ifn(PRESENT=
1, max(0, EVENT_NUM ), .) ;
run;

View solution in original post

4 REPLIES 4
ChrisNZ
Tourmaline | Level 20

Give us a small set of data to play with:

data set have1

data set have2

data set want

Patrick
Opal | Level 21

To give you an answer here I have to try and test and I don't want to do all the work. So please provide proper data steps creating sample data and then show us how the desired result needs to look like.

Walternate
Obsidian | Level 7

Here is the code to create Dataset1 (I used just one year of data):

data dataset1;

input id date1 mmddyy10. present_201001 present_201002 present201003 present201004 present201005 present201006

present_201007 present_201008 present201009 present201010 present201011 present201012;

format date1 mmddyy10.;

cards ;

1 01/07/2010 1 1 1 1 1 1 1 1 1 1 1 1

2 05/09/2010 0 0 0 0 1 1 1 1 1 1 1 1

3 04/12/2010 1 1 1 1 1 1 1 1 1 1 1 1

4 03/19/2010 1 1 1 1 1 1 1 1 1 0 0 0

5 12/07/2010 1 1 1 1 1 1 1 1 1 1 1 1

;

run;

This is the output dataset when you run that code:

iddate1present_201001present_201002present_201003present_201004present_201005present_201006present_201007present_201008present_201009present_201010present_201011present_201012
11/7/2010111111111111
25/9/2010000011111111
34/12/2010111111111111
43/19/2010111111111000
512/7/2010111111111111

Here is the code for Dataset2:

data dataset2;

input id date2 mmddyy10. event_num;

format date2 mmddyy10.;

cards ;

1 1/9/2010 17

1 1/21/2010 23

1 2/4/2010 32

1 2/7/2010 12

1 4/5/2010 14

1 4/10/2010 1

1 4/21/2010 9

1 5/4/2010 91

1 5/9/2010 17

1 6/12/2010 33

2 5/4/2010 56

2 5/12/2010 74

2 5/30/2010 80

2 6/4/2010 11

2 6/20/2010 52

2 10/12/2010 63

2 10/29/2010  78

2 10/31/2010 33

3 1/24/2010 12

3 1/27/2010 10

3 2/3/2010 8

3 2/7/2010 9

3 2/17/2010 19

3 7/9/2010 22

3 7/14/2010 55

4 1/23/2010 29

4 1/27/2010 44

4 1/29/2010 35

4 3/5/2010 74

4 3/14/2010 92

4 7/2/2010 2

4 7/5/2010 13

4 8/9/2010 53

5 1/5/2010 72

5 1/23/2010 49

5 1/24/2010 14

5 9/5/2010 20

5 9/7/2010  11

5 9/12/2010 1

5 12/1/2010 4

5 12/31/2010 14

;

run;

The output dataset:

iddate2event_num
11/9/20107
11/21/201023
12/4/20102
12/7/20102
14/5/20104
14/10/20101
14/21/20109
15/4/20101
15/9/20107
16/12/201033
25/4/20106
25/12/201074
25/30/201080
26/4/20101
26/20/201052
210/12/201063
210/29/201078
210/31/201033
31/24/201012
31/27/201010
32/3/20103
32/17/201019
37/9/20102
37/14/201055
41/23/201029
41/27/201044
41/29/201035
43/5/20104
43/14/201092
47/2/20104
48/9/20103
51/5/20102
51/23/201049
51/24/201014
59/5/20100
59/7/201011
59/12/20101
512/1/20104
512/31/201014

The format of the dataset I want doesn't matter--the sums could be in columns or in rows, but the basic idea is:

     1. Sums of event_num in each month relative to each person's date 1 (ie, -1 month, etc)

     2. If a person does not have any events in a given month, they should have a missing value        for the sum variable for that month UNLESS the present_201001 etc. vars from Dataset1           indicate that they were present in that month (=1), in which case, they should have a value of     0.

So one possible final dataset would be like this:

IDPre_date1_12_mthsPre_date1_11_mthsPre_date1_10_mthsPre_date1_9_mthsPre_date1_8_mthsPre_date1_7_mthsPre_date1_6_mthsPre_date1_5_mthsPre_date1_4_mthsPre_date1_3_mthsPre_date1_2_mthsPre_date1_1_mthsSame_mth_as_Date1Post_date1_1_mthsPost_date1_2_mthsPost_date1_3_mthsPost_date1_4_mthsPost_date1_5_mthsPost_date1_6_mthsPost_date1_7_mthsPost_date1_8_mthsPost_date1_9_mthsPost_date1_10_mthsPost_date1_11_mthsPost_date1_12_mths
1............304014833000000.
2............1605300017400.....
3.........222200005700000....
4..........108096000430......
56500000000120018............

But it could also be the same numbers structured in a different way, like this:

ID     Relative_mth     total

1       -12                        .

etc.

ChrisNZ
Tourmaline | Level 20

This gives the values you are after.

data T1;
 
input ID DATE1 mmddyy10.

        PRESENT_201001 PRESENT_201002 PRESENT201003 PRESENT201004 PRESENT201005 PRESENT201006
        PRESENT_201007 PRESENT_201008 PRESENT201009 PRESENT201010 PRESENT201011 PRESENT201012;
  DATE1=intnx(
'month',DATE1,0);
 
format DATE1 monyy.;
cards ;
1 01/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
2 05/09/2010 0 0 0 0 1 1 1 1 1 1 1 1
3 04/12/2010 1 1 1 1 1 1 1 1 1 1 1 1
4 03/19/2010 1 1 1 1 1 1 1 1 1 0 0 0
5 12/07/2010 1 1 1 1 1 1 1 1 1 1 1 1
run;

data T1A (keep=ID DATE1 DATE2 PRESENT); %* Prepare data for merge;

   set T1;
  
array DATES [12] PRESENT:;
  
do I=1to12;
     DATE2=intnx(
'month','01jan2010'd,I-1);
     PRESENT=DATES;
    
output;
  
end;
  
format DATE2 monyy.;
run;

data T2;
 
input ID    DATE2 : mmddyy.     EVENT_NUM ;
  DATE2=intnx(
'month',DATE2,0);
 
format DATE2 monyy.;
cards;
1      1/9/2010      7
1      1/21/2010     23
1      2/4/2010      2
1      2/7/2010      2
1      4/5/2010      4
1      4/10/2010     1
1      4/21/2010     9
1      5/4/2010      1
1      5/9/2010      7
1      6/12/2010     33
2      5/4/2010      6
2      5/12/2010     74
2      5/30/2010     80
2      6/4/2010      1
2      6/20/2010     52
2      10/12/2010    63
2      10/29/2010    78
2      10/31/2010    33
3      1/24/2010     12
3      1/27/2010     10
3      2/3/2010      3
3      2/17/2010     19
3      7/9/2010      2
3      7/14/2010     55
4      1/23/2010     29
4      1/27/2010     44
4      1/29/2010     35
4      3/5/2010      4
4      3/14/2010     92
4      7/2/2010      4
4      8/9/2010      3
5      1/5/2010      2
5      1/23/2010     49
5      1/24/2010     14
5      9/5/2010      0
5      9/7/2010      11
5      9/12/2010     1
5      12/1/2010     4
5      12/31/2010    14
run;
proc summarydata=T2 nway%* Prepare data for merge;
 
class ID DATE2;
 
var EVENT_NUM;
 
outputout=T2A(keep=ID DATE2 EVENT_NUM) sum=;
run;

data T3; %* Merge;

  merge T1A T2A;
 
by ID DATE2;
  TOTAL= ifn(PRESENT=
1, max(0, EVENT_NUM ), .) ;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1085 views
  • 0 likes
  • 3 in conversation