BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

I want to create multiple rows based on time interval between two variables. I've the data as below. We have 'Duration' value in the source data which difference between 'Start_Mo' and 'End_Mo'. Value of 'End_Mo' will always be greater than 'Start_Mo'.

Source data:

data have;
    input Branch $ Segment $
        Start_Mo :anydtdte7. End_Mo :anydtdte7. Duration Id Cmpny $;
    format  start_mo end_mo yymmd7.;
    datalines;
71831234 7183_AS4     2017-03    2017-09          3     1234   Capgemini
71831248 7183_AS4     2016-12    2017-09          4     78901  TCS
71831234 7183_AS5     2017-03    2017-12          4     1234   Capgemini
71831234 7183_AS5     2017-06    2017-03          5     78901  TCS
;
run;

Now I want to create as many as rows based on the value from the variable 'Duration' and I've to create one more variable called 'Cal_Month' which should have values in quarters between time interval of the variables 'Start_Mo' and 'End_Mo'
with group by 'Segment' and 'Branch'

 

Desired Result is,

Branch	Segment	Start_Mo End_Mo	   Duration	Cmpny	       cal_month
71831234 7183_AS4 2017-03 2017-09	3	Capgemin	 2017-03
71831234 7183_AS4 2017-03 2017-09	3	Capgemin	 2017-06
71831234 7183_AS4 2017-03 2017-09	3	Capgemin	 2017-09
71831248 7183_AS4 2016-12 2017-09	4	TCS	         2016-12
71831248 7183_AS4 2016-12 2017-09	4	TCS	         2017-03
71831248 7183_AS4 2016-12 2017-09	4	TCS              2017-06
71831248 7183_AS4 2016-12 2017-09	4	TCS	         2017-09
71831234 7183_AS5 2017-03 2017-12	4	Capgemin	 2017-03
71831234 7183_AS5 2017-03 2017-12	4	Capgemin	 2017-06
71831234 7183_AS5 2017-03 2017-12	4	Capgemin	 2017-09
71831234 7183_AS5 2017-03 2017-12	4	Capgemin	 2017-12
71831234 7183_AS5 2017-03 2018-03	5	TCS	         2017-03
71831234 7183_AS5 2017-03 2018-03	5	TCS	         2017-06
71831234 7183_AS5 2017-03 2018-03	5	TCS	         2017-09
71831234 7183_AS5 2017-03 2018-03	5	TCS	         2017-12
71831234 7183_AS5 2017-03 2018-03	5	TCS	         2018-03

For example, values of 'Cal_Month' should be in 2017-03, 2017-06 and 2017-09 if 'Duration' value is 3 for each  'Segment'

 

In real life I've data with 'Duration' from 1 to 9 and 'Segment' is also have multiple values. Just to get the idea to tackle this Scenario, I just shared the sample data here. 

 

I tried with the code as below, but it's not producing the desired results.

data want;
    set have;
    by Segment;
    cal_month=start_mo;

    do until (cal_month > end_mo);
        output;
        cal_month=intnx('month',cal_month,duration);
    end;

    format cal_month yymmd7.;
run;

 

 

 

 

12 REPLIES 12
PeterClemmensen
Tourmaline | Level 20

Your logic is almost there. Try this

 

data want;
   set have;
   cal_month=start_mo;
   do while (cal_month <= End_Mo);
      output;
      cal_month = intnx('qtr', cal_month, 1, 's');
   end;
   format cal_month yymmd7.;
run;

 

Result (Edited to comply with correct sample data)

 

Branch   Segment  Start_Mo End_Mo  Duration Id    Cmpny    cal_month 
71831234 7183_AS4 2017-03  2017-09 3        1234  Capgemin 2017-03 
71831234 7183_AS4 2017-03  2017-09 3        1234  Capgemin 2017-06 
71831234 7183_AS4 2017-03  2017-09 3        1234  Capgemin 2017-09 
71831234 7183_AS4 2017-03  2017-12 4        78901 TCS      2017-03 
71831234 7183_AS4 2017-03  2017-12 4        78901 TCS      2017-06 
71831234 7183_AS4 2017-03  2017-12 4        78901 TCS      2017-09 
71831234 7183_AS4 2017-03  2017-12 4        78901 TCS      2017-12 
71831234 7183_AS5 2017-03  2017-12 4        1234  Capgemin 2017-03 
71831234 7183_AS5 2017-03  2017-12 4        1234  Capgemin 2017-06 
71831234 7183_AS5 2017-03  2017-12 4        1234  Capgemin 2017-09 
71831234 7183_AS5 2017-03  2017-12 4        1234  Capgemin 2017-12 
71831234 7183_AS5 2017-03  2018-03 5        78901 TCS      2017-03 
71831234 7183_AS5 2017-06  2018-03 5        78901 TCS      2017-06 
71831234 7183_AS5 2017-06  2018-03 5        78901 TCS      2017-09 
71831234 7183_AS5 2017-06  2018-03 5        78901 TCS      2017-12 
71831234 7183_AS5 2017-06  2018-03 5        78901 TCS      2018-03 

 

David_Billa
Rhodochrosite | Level 12

@PeterClemmensen thank you. But it seems do while is taking more time to process with 10000+ records. Is there a chance to do it with other ways?

PeterClemmensen
Tourmaline | Level 20

More time than what? You can do it with Do Until logic as you tried first, but I doubt that it'll be faster

 

data want;
   set have;
   cal_month=start_mo;
   do until (cal_month > End_Mo);
      output;
      cal_month = intnx('qtr', cal_month, 1, 's');
   end;
   format cal_month yymmd7.;
run;
Kurt_Bremser
Super User

@David_Billa wrote:

@PeterClemmensen thank you. But it seems do while is taking more time to process with 10000+ records. 


Only if you work with a mechanical computer from WW2. Anything that ever ran SAS does this in seconds, and nowadays in fractions of a second:

data have;
    input Branch $ Segment $
        Start_Mo :anydtdte7. End_Mo :anydtdte7. Duration Id Cmpny $;
    format  start_mo end_mo yymmd7.;
    do i = 1 to 10000;
      output;
    end;
    drop i;
    datalines;
71831234 7183_AS4     2017-03    2017-09          3     1234   Capgemini
71831234 7183_AS4     2017-03    2017-12          4     78901  TCS
71831234 7183_AS5     2017-03    2017-12          4     1234   Capgemini
71831234 7183_AS5     2017-06    2018-03          5     78901  TCS
;

data want;
   set have;
   cal_month=start_mo;
   do while (cal_month <= End_Mo);
      output;
      cal_month = intnx('qtr', cal_month, 1, 's');
   end;
   format cal_month yymmd7.;
run;

Log excerpt:

 88         data want;
 89            set have;
 90            cal_month=start_mo;
 91            do while (cal_month <= End_Mo);
 92               output;
 93               cal_month = intnx('qtr', cal_month, 1, 's');
 94            end;
 95            format cal_month yymmd7.;
 96         run;
 
 NOTE: There were 40000 observations read from the data set WORK.HAVE.
 NOTE: The data set WORK.WANT has 150000 observations and 8 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.16 seconds
       cpu time            0.15 seconds

But since you have a duration variable already, do it in an iterative do loop:

data want;
set have;
format cal_month yymmd7.;
do period = 1 to duration;
  cal_month = intnx('quarter',start_mo,period-1,'b');
  output;
end;
drop period;
run;

whic is even faster:

 73         data want;
 74         set have;
 75         format cal_month yymmd7.;
 76         do period = 1 to duration;
 77           cal_month = intnx('quarter',start_mo,period-1,'b');
 78           output;
 79         end;
 80         drop period;
 81         run;
 
 NOTE: There were 40000 observations read from the data set WORK.HAVE.
 NOTE: The data set WORK.WANT has 160000 observations and 8 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.09 seconds
       cpu time            0.08 seconds
David_Billa
Rhodochrosite | Level 12

Your code don't seem to work to produce the desired results. I should get 16 records in the desired Output but I could see only 11 after executing the code which you gave. I feel we still need to tweak the logic.

PeterClemmensen
Tourmaline | Level 20

The error is in the given sample data in the dates in the last row. This is the correct example data given your desired result.

 

Both of my posted code snippets produce the correct result.

 

data have;
    input Branch $ Segment $
        Start_Mo :anydtdte7. End_Mo :anydtdte7. Duration Id Cmpny $;
    format  start_mo end_mo yymmd7.;
    datalines;
71831234 7183_AS4     2017-03    2017-09          3     1234   Capgemini
71831234 7183_AS4     2017-03    2017-12          4     78901  TCS
71831234 7183_AS5     2017-03    2017-12          4     1234   Capgemini
71831234 7183_AS5     2017-03    2018-03          5     78901  TCS
;
run;
David_Billa
Rhodochrosite | Level 12

Yes, your code is working with the example data. When I apply your code in the real life data with 40 variables 10500 observations it keep on running. That's the reason, I asked you to help me with alternative methods. If possible, I would like to know how to achive the results with 'do until'

PeterClemmensen
Tourmaline | Level 20

This should not take forever. I suspect that this is a data problem. Do you have missing values for Start_Mo?

David_Billa
Rhodochrosite | Level 12
No, Start_mo and End_mo will never be missing. But there are other few
variables which are missing and I did not mention those variables in my
post as it is not useful for our calculation.
David_Billa
Rhodochrosite | Level 12
I believe you have assumed that the value of 'Start_Mo' will always be same for all records. But it's not the case in real life. Therefore in my last row of the sample data I gave other date value in 'Start_mo' which is different from other previous rows.

How can we tweak your ' do while' now?
PeterClemmensen
Tourmaline | Level 20

Can you update your sample data to be representable of your actual data?

David_Billa
Rhodochrosite | Level 12

@PeterClemmensen I have updated the sample data and also the desired result in my initial post. Please note that have to create as many as rows based on two variables 'Segment' and 'Branch' and this is also I have updated in my post. If you see the sample data now, I have updated the value of 'Branch' for one of the record in 'have' data.

 

Thanks in advance for your effort in helping me to identify the logic.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2131 views
  • 2 likes
  • 3 in conversation