BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aguilar_john
Obsidian | Level 7

Hello everyone,

 

I have a table which indicates me for every ID the number of events happened at a specific time. The number of the events indicate me for many monh the event was active and also when it started. So in my table have2 for example I see for ID=1 that the event occured first in November 2020 at lastest for 2 months (November+December), for ID=2 it started in January 2021 and only lasted 1 month etc.

 

data have;
	input id date: date9. number;
	format  date date9.;
	datalines;
1 01JAN2021 2
2 01FEB2021 1
3 01JUN2021 3
;
run;


data have2;
set have;
format start_month MONYY.;
start_month=intnx("month",date,-number);
run;

My goal is to have for every ID, starting with the first month the event occured, an entry of 0 or 1 which indicates me, if for that month the event was active or not. The idea is to have the table recreated every month based on the information provided in the have table and include another mohthly entry either with 0 or 1. 

So to have a table like this:

ID Month flag
1 01.11.2020 1
1 01.12.2020 1
1 01.01.2021 0
1 01.02.2021 0
1 01.03.2021 0
1 01.04.2021 0
1 01.05.2021 0
1 01.06.2021 0
2 01.01.2021 1
2 01.02.2021 0
2 01.03.2021 0
2 01.04.2021 0
2 01.05.2021 0
2 01.06.2021 0
3 01.03.2021 1
3 01.04.2021 1
3 01.05.2021 1
3 01.06.2021 0

 

Maybe some of you have a good idea, how to solve that.

Thank you. 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Please don't use a date like 01.11.2020 without telling use which is month and which is day.

 

Second please make sure that any example data can be used to create the desired output. Since your example for ID=1 has no dates involving 2020 it is extremely hard to tell what may be going on to get the desired output.

It helps to show ALL the steps needed in generating your output data. Such as why are there dates to 01.06.2021? There is no rule involved mentioned

 

A guess:

data want;
   set have;
   startmonth = intnx('month',date,-1*number,'b') ;
   temp= -1*number; 
   /* the next loop stops after output for Jun 2021*/
   do until (startmonth= '01JUL2021'd);
      flag=  (temp<0);    
      output;
      temp=temp+1;
      startmonth=intnx('month',startmonth,1,'b');
   end;
   drop temp;
   format startmonth date9.;
run;

View solution in original post

6 REPLIES 6
ballardw
Super User

Please don't use a date like 01.11.2020 without telling use which is month and which is day.

 

Second please make sure that any example data can be used to create the desired output. Since your example for ID=1 has no dates involving 2020 it is extremely hard to tell what may be going on to get the desired output.

It helps to show ALL the steps needed in generating your output data. Such as why are there dates to 01.06.2021? There is no rule involved mentioned

 

A guess:

data want;
   set have;
   startmonth = intnx('month',date,-1*number,'b') ;
   temp= -1*number; 
   /* the next loop stops after output for Jun 2021*/
   do until (startmonth= '01JUL2021'd);
      flag=  (temp<0);    
      output;
      temp=temp+1;
      startmonth=intnx('month',startmonth,1,'b');
   end;
   drop temp;
   format startmonth date9.;
run;
aguilar_john
Obsidian | Level 7

Thank you for your answer. My example for Id=1 does not include any data for 2020 because that is exactly how the data is set up, for some id's there is no datapoint before.

01JUN2021 was set randomly, it could very well also be another date, it always would be based on a predefined report date, in this case it was 01JUN2021. 

Your code works very well, thank you very much!

 

I have one follow-up question where you maybe also have an idea on how to solve it. If for the same id I would multiple entries e.g. for id=1 for APR2021:

data have;
	input id date: date9. number;
	format  date date9.;
	datalines;
1 01JAN2021 2
2 01FEB2021 1
3 01JUN2021 3
1 01APR2021 1
;
run;

My result will have double entries for the month's MAR2021 until JUN2021, with the entry for MAR2021 one time with flag=1 and the other entry with a flag=0. Is there a way to solve that in a way so that the end table would look like this:

id date number startmonth flag
1 01Jan2021 2 01Nov2020 1
1 01Jan2021 2 01Dec2020 1
1 01Jan2021 2 01Jan2021 0
1 01Jan2021 2 01Feb2021 0
1 01Apr2021 1 01Mar2021 1
1 01Apr2021 1 01Apr2021 0
1 01Apr2021 1 01May2021 0
1 01Apr2021 1 01Jun2021 0

instead of this:

id date number startmonth flag
1 01Jan2021 2 01Nov2020 1
1 01Jan2021 2 01Dec2020 1
1 01Jan2021 2 01Jan2021 0
1 01Jan2021 2 01Feb2021 0
1 01Jan2021 2 01Mar2021 0
1 01Jan2021 2 01Apr2021 0
1 01Jan2021 2 01May2021 0
1 01Jan2021 2 01Jun2021 0
1 01Apr2021 1 01Mar2021 1
1 01Apr2021 1 01Apr2021 0
1 01Apr2021 1 01May2021 0
1 01Apr2021 1 01Jun2021 0

 

So to say that if there is a new entry which would implicate another information for the same event but does not overwrite the already existing entries. So in this case the event for ID=1 occurred for NOV2020 and DEC2020 and again for MAR2021.

ballardw
Super User

You now a much more complicated problem. One of the first bits is why is Mar 2021 not in the data for number 2 with Id 1?

You need to get a lot more detailed in your description of how the output is generated. I am afraid this is one of the case where I could duplicate the appearance for that example data but the approach might not work in a generic sense because the explicit rules to implement are not stated.

 

What ever approach, the first bit will be to sort your data by the Id and date.

Then it appears you would want to do a "look ahead" to see if the next record is the same ID and the date with its number offsed before the arbitrary end date. If such a condition is true then use the next date/offset less one month as the end instead of the arbitrary date.

Maybe.

 

 

aguilar_john
Obsidian | Level 7

Ok thanks for the feedback, I try to give some more details and then it may become clearer: The goal at the end would be, to have a table which shows per id only one entry of the startmonth and the corresponding flag. The flag should only show 1 if (based on the have table) the underlying event ocurred. Since the have table can contain multiple entries per id (since the event can happen more than once within the underlying period), I want to sort of "overwrite" the information for a already existing startmonth if the have table was update and a new dataline is included. That is the reason for id=1,MAR2021, number=2 not beeing included in the desired table. 

Based on the entry 

1 01JAN2021 2

from table have, MAR2021 would show flag=0. Since entry

1 01APR2021 1

also exists, MAR2021 would not have flag=0 anymore but now flag=1. As long as the second entry for id=1 does not exist it is no problem that flag=0 for id=1 in MAR2021.

If for example the second entry for id=1 would be

1 01FEB2021 2

then I would need to have NOV2020, DEC2020 and JAN2021 with flag=1 and the rest until end date with flag=0. 

Does that give you more details to the desired outcome?

 

I am not quite sure what exactly you mean by "look ahead". 

PhilC
Rhodochrosite | Level 12

I am confused, but I think I see a pattern when flag=1, but I wouldn't know how you select what months to put in the table when flag=0.  Perhaps you can explain that for us?

 

otherwise I offer this as a partial solution:

data have;
	input id date: date9. number;
	format  date date9.;
	datalines;
1 01JAN2021 2
2 01FEB2021 1
3 01JUN2021 3
1 01APR2021 1
;
proc sort ;
  by ID date;
data want;
  set have;
    by id date;
  startmonth=date;
    format  startmonth date9.;
  flag=1;
  do i=number by -1 until (i=1); drop i;
    startmonth=intnx('month',startmonth,-1);
    output;
  end;
proc sort nodup;
  by ID date startmonth;
run;

 

aguilar_john
Obsidian | Level 7
Thank you for your approach, it does not always work for my scenarios. But I solved it differently. I firstly created a datafile with all ID's and months going forward, then I only join the desired entries to my "base table".

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1025 views
  • 0 likes
  • 3 in conversation