BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sashelppls
Calcite | Level 5

I have two datasets as follows:

 

 

 

 

data end_dates;
input group_id $ date :ddmmyy10.;
format date ddmmyy10.;
datalines;
2AAAB9 01/04/2020 
2AAAC4 02/04/2019
2AAAC9 05/10/2020
2AAAD2 12/19/2018
2AAAM1 08/23/2017
;
run;

 

 

 

 

 

data subjects;
input group_id $ open_date :ddmmyy10.;
format open_date ddmmyy10.;
datalines;
2AAAA0 12/29/2017
2AAAA2 5/14/2014
2AAAC9 2/04/2020
2AAAD1 1/03/2020
2AAAD2 1/29/2019
;
run;

I want to create two new variables in the subjects dataset as follows (I very rarely use SAS and have had trouble, but here's some pseudocode essentially of what I want to do):

 

 

Var Before:If the group ID in the subjects dataset is in the end_dates dataset, and the open_date in the subjects dataset is before the date variable in the end_dates dataset, the value will be 1, otherwise it's 0.

Var Within_month:If the group ID in the subjects dataset is in the end_dates dataset, and the open_date in the subjects dataset is either the same month or one month after (just looking at the month, not necessarily within 31 days) of the date variable in the end_dates dataset, the value will be 1, otherwise it's 0.

 

So then the dataset I would have is as follows:

data subjects; 
input group_id $ open_date :ddmmyy10. before within_month; 
format open_date ddmmyy10.; 
datalines; 
2AAAA0 12/29/2017 0 0 
2AAAA2 5/14/2014 0 0 
2AAAC9 2/04/2020 1 0 
2AAAD1 1/03/2020 0 0 
2AAAD2 1/29/2019 0 1 ; 
run;

 

I believe I would use the MONTH and INTNX functions for the within_month variable and the before variable seems pretty easy to compare, but I have no idea how to check if the group_id from the subjects dataset is in the end_dates dataset, and if it is, compare the open_date from the subjects data to the end_date of that particular observation. I have hundreds of records in the end_dates dataset so not like I can just hard code it. 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Hint number 1: Provide data step code that is correct. When I run your data steps:

1    data end_dates;
2    input group_id $ date :ddmmyy10.;
3    format date ddmmyy10.;
4    datalines;

NOTE: Invalid data for date in line 8 8-17.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
8          2AAAD2 12/19/2018
group_id=2AAAD2 date=. _ERROR_=1 _N_=4
NOTE: Invalid data for date in line 9 8-17.
9          2AAAM1 08/23/2017
group_id=2AAAM1 date=. _ERROR_=1 _N_=5
NOTE: The data set WORK.END_DATES has 5 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


10   ;
11
12
13   data subjects;
14   input group_id $ open_date :ddmmyy10.;
15   format open_date ddmmyy10.;
16   datalines;

NOTE: Invalid data for open_date in line 17 8-17.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
17         2AAAA0 12/29/2017
group_id=2AAAA0 open_date=. _ERROR_=1 _N_=1
NOTE: Invalid data for open_date in line 18 8-16.
18         2AAAA2 5/14/2014
group_id=2AAAA2 open_date=. _ERROR_=1 _N_=2
NOTE: Invalid data for open_date in line 21 8-16.
21         2AAAD2 1/29/2019
group_id=2AAAD2 open_date=. _ERROR_=1 _N_=5
NOTE: The data set WORK.SUBJECTS has 5 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


22   ;

Since the values you show have things like 19, 23 and 29 in the second position of the date values your Informat almost certainly should be MMDDYY10, or you have a mixed layout of your date values and more headaches.

 

Second, your verbiage actually seems to say to set the Before and within_month values only when the group ids match but that is not what your example desired output shows.

 

This does what you ask:

data end_dates;
input group_id $ date :mmddyy10.;
format date ddmmyy10.;
datalines;
2AAAB9 01/04/2020 
2AAAC4 02/04/2019
2AAAC9 05/10/2020
2AAAD2 12/19/2018
2AAAM1 08/23/2017
;


data subjects;
   input group_id $ open_date :mmddyy10.;
   format open_date ddmmyy10.;
datalines;
2AAAA0 12/29/2017
2AAAA2 5/14/2014
2AAAC9 2/04/2020
2AAAD1 1/03/2020
2AAAD2 1/29/2019
;


data combined;
   merge subjects (in=insub)
         end_dates (in=inend)
   ;
   by group_id;
   if insub and inend then do;
      before = (open_date < date);
      within_month = (intnx('month',date,0,'B') le open_date le intnx('month',date,1,'E'));
   end;
   if missing(before) then before=0;
   if missing(within_month) then within_month=0;
   if insub;
   drop date;
run;

Srtongly suggest creating a new data set instead of attempting to reuse Subjects. A logic error or typo can destroy your existing data set and might cause considerable work to recover it.

The key bits above are the IN= options to the data sets on the MERGE. This creates temporary variables that indicate whether the current record as merged has elements from which records and can be used to test when the group comes from both sets.

 

Caution: if both of your data sets contain multiple records with the same group_id this likely will not work as the Data step merge is not the tool when that is the case.

 

 

View solution in original post

5 REPLIES 5
ballardw
Super User

Hint number 1: Provide data step code that is correct. When I run your data steps:

1    data end_dates;
2    input group_id $ date :ddmmyy10.;
3    format date ddmmyy10.;
4    datalines;

NOTE: Invalid data for date in line 8 8-17.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
8          2AAAD2 12/19/2018
group_id=2AAAD2 date=. _ERROR_=1 _N_=4
NOTE: Invalid data for date in line 9 8-17.
9          2AAAM1 08/23/2017
group_id=2AAAM1 date=. _ERROR_=1 _N_=5
NOTE: The data set WORK.END_DATES has 5 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


10   ;
11
12
13   data subjects;
14   input group_id $ open_date :ddmmyy10.;
15   format open_date ddmmyy10.;
16   datalines;

NOTE: Invalid data for open_date in line 17 8-17.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
17         2AAAA0 12/29/2017
group_id=2AAAA0 open_date=. _ERROR_=1 _N_=1
NOTE: Invalid data for open_date in line 18 8-16.
18         2AAAA2 5/14/2014
group_id=2AAAA2 open_date=. _ERROR_=1 _N_=2
NOTE: Invalid data for open_date in line 21 8-16.
21         2AAAD2 1/29/2019
group_id=2AAAD2 open_date=. _ERROR_=1 _N_=5
NOTE: The data set WORK.SUBJECTS has 5 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


22   ;

Since the values you show have things like 19, 23 and 29 in the second position of the date values your Informat almost certainly should be MMDDYY10, or you have a mixed layout of your date values and more headaches.

 

Second, your verbiage actually seems to say to set the Before and within_month values only when the group ids match but that is not what your example desired output shows.

 

This does what you ask:

data end_dates;
input group_id $ date :mmddyy10.;
format date ddmmyy10.;
datalines;
2AAAB9 01/04/2020 
2AAAC4 02/04/2019
2AAAC9 05/10/2020
2AAAD2 12/19/2018
2AAAM1 08/23/2017
;


data subjects;
   input group_id $ open_date :mmddyy10.;
   format open_date ddmmyy10.;
datalines;
2AAAA0 12/29/2017
2AAAA2 5/14/2014
2AAAC9 2/04/2020
2AAAD1 1/03/2020
2AAAD2 1/29/2019
;


data combined;
   merge subjects (in=insub)
         end_dates (in=inend)
   ;
   by group_id;
   if insub and inend then do;
      before = (open_date < date);
      within_month = (intnx('month',date,0,'B') le open_date le intnx('month',date,1,'E'));
   end;
   if missing(before) then before=0;
   if missing(within_month) then within_month=0;
   if insub;
   drop date;
run;

Srtongly suggest creating a new data set instead of attempting to reuse Subjects. A logic error or typo can destroy your existing data set and might cause considerable work to recover it.

The key bits above are the IN= options to the data sets on the MERGE. This creates temporary variables that indicate whether the current record as merged has elements from which records and can be used to test when the group comes from both sets.

 

Caution: if both of your data sets contain multiple records with the same group_id this likely will not work as the Data step merge is not the tool when that is the case.

 

 

sashelppls
Calcite | Level 5

Thank you for this detailed answer! Yes, sorry about the typo, it should be mmddyy format.

 

I do have multiples with the same group_id in the subjects dataset but not the end_dates dataset. Would a proc sql left join work in that case? Or can I still do a merge?

Tom
Super User Tom
Super User

One to Many works fine with MERGE statement.

sashelppls
Calcite | Level 5

Got it. I take it I still have to sort both datasets before merging, right?

ballardw
Super User

@sashelppls wrote:

Got it. I take it I still have to sort both datasets before merging, right?


There are times when you want to use a BY statement that isn't sorted and SAS allows this with the NOTSORTED option. This is certainly not one of those.

 

Merging by a variable wants sorted data. Merge and Update will not allow the NOTSORTED option.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 512 views
  • 2 likes
  • 3 in conversation