BookmarkSubscribeRSS Feed
vraj1
Quartz | Level 8

I use the below code to split the first id for first 2 observations. My ocde works for id 102 but not for id=101 as the first observation is only 2 days and the next one hsould split for 2 and the rest as it is. like below. any suggestion where i went wrong?

 

data WORK.want;
infile datalines dsd truncover;
input id:32. num:32. std:DATE9. endt:DATE9.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018
101,3,16MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,13JAN2018
102,2,14JAN2018,15JAN2018
102,2,16JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;

 

data WORK.TEST;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;

data a_0;
 format std endt date9.;
      set TEST;
      diff=ENDT-STD;
      if diff>=3 then tmp=1;
      else tmp=2;
      diff1=int(diff/2);
run;

proc sort data=a_0;
      by id tmp num;
run;

data a_05;
      format std endt date9.;
      set a_0;
      by id tmp num;
      retain end1;
      format end1 date9.;
      if first.id then end1=.;
      if first.id then do;
            do i=1 to diff1;
            if i=1 then do; STD=STD; end1=STD+1; output; end;
            else if i < diff1 then do;
                  STD=end1+1; end1=STD+1; output; end;
            else do; STD=end1+1; end1=ENDT; output; end;
            end;
      end;
      else do; end1=ENDT; output ; end;
run;

 

Also i have another instance where i need to break  the id from day 1to4 and next as 5 to 7. Can i use same logic somehow?

10 REPLIES 10
Kurt_Bremser
Super User

Why is 101,3 (which covers 9 days) split into just 2 observations, but 102,2 (which covers only 7 days) into 3?

Can you give us a plain-text rule for the splitting?

vraj1
Quartz | Level 8

The rule is i need to split the id as day1-2, day3-4, and if the id has same num the rest of days and the remaining as it is

another program  in which  scenario is to split by day1-4, day5-7 and remainaing as it is.

 in my code which i used it works only if the first duration is more than 6 and does not work if the first record for id has only 2 day duration and the next one with different num.

101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018
101,3,16MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018

 

 

 

vraj1
Quartz | Level 8

For id 101 the first record is starts at 12MAR2018 and end date 13MAR2018 and the next record  14MAR2018 to 23MAR2018 and i expect first scenario result as

101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018
101,3,16MAR2018,23MAR2018

101,4,24MAR2018,29MAR2018

 

but the code which comes from my code is 

101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018

101,3,16MAR2018,17MAR2018

101,3,18MAR2018,19MAR2018

 

But for id=102 since i have duration of the first record more than 2 i.e 12JAN2018,19JAN2018 it splits properly.

 

Hope i explained it clearly.

 

 

Kurt_Bremser
Super User

So you need to run a counter. The first two observations for a given id, even with different num, should only cover 2 days each. From then on, any remaining rest is output and all subsequent observations remain as they are.

At least this code will produce your want dataset:

data have;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
  format std endt yymmddd10.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
run;

data want;
set have (rename=(std=_std endt=_endt));
by id;
retain obscount;
if first.id
then obscount = 0;
format std endt yymmddd10.;
std = _std;
endt = std + 1;
do while (obscount < 2 and endt < _endt);
  output;
  std = std + 2;
  endt = min(endt + 2,_endt);
  obscount + 1;
end;
if endt < _endt or std = _std
then do;
  endt = _endt;
  output;
  obscount + 1;
end;
drop obscount _std _endt;
run;

proc print data=want noobs;
run;

Result:

 id    num           std          endt

101     2     2018-03-12    2018-03-13
101     3     2018-03-14    2018-03-15
101     3     2018-03-16    2018-03-23
101     4     2018-03-24    2018-03-29
102     2     2018-01-12    2018-01-13
102     2     2018-01-14    2018-01-15
102     2     2018-01-16    2018-01-19
102     3     2018-01-20    2018-01-28
102     4     2018-01-29    2018-02-12
102     5     2018-02-13    2018-02-18
102     6     2018-02-19    2018-02-28
vraj1
Quartz | Level 8

Thanks a lot. it works. 

There is another scenario where i should split the first 2 records on day1-4 and day 5-7.

 

Can i somehow use the same program with changes?

mkeintz
PROC Star

Then what do you do when the first record is only 1 day.  I notice that your output file identifies a single record number source for each of your resulting periods.  But if you have single day records, that constructing 2-day intervals at the start of each id, requires multiple source record numbers.

 

And I have the same question if the first record is a 3-day span.

 

Please give output example for the following, with output variables NUM, STD, and ENDT.  Or if these are not possible input conditions, let us know.

 

first record has 1 day, 2nd has 3 or more

first and second record each have 2 days

first record has 3 days.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
vraj1
Quartz | Level 8

the second scenario i need is to split day1-4 and day5-7 from the first id

data have;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
  format std endt yymmddd10.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
run;

data have;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
  format std endt yymmddd10.;
datalines4;
101,2,12MAR2018,15MAR2018
101,3,16MAR2018,18MAR2018
101,3,19MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,15JAN2018
102,2,16JAN2018,18JAN2018
102,2,19JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
run;
mkeintz
PROC Star

How can we advise you on the 2nd variation of the problem when we don't yet know whether all the rules for the first variation are known?  That's why I posted my questions earlier (and why @Kurt_Bremser also posted questions).

 

In particular, you have not said what you want the output to look like if a single incoming record is not long enough to satisfy your output date range requirements at the beginning of each id.  If we don't know the answer for 2 followed by 2, then satisfying the 4 followed by 3 is an exercise in futility.

 

Help us help you.  If you don't understand why a question is being asked, let us know.  Otherwise ignoring the question requires us to be mind readers, a skill I'd rather leave unused.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
vraj1
Quartz | Level 8

Sorry for not being clear.

The first variation worked as per Kurt's code and the second one i need to split in similar way as variation 1 but split day1-4 and day5-7  of the first id 

if the data is like below

data have;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
  format std endt yymmddd10.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
run;

i neeed it to split like below which is irrespective of the first id duration it should split on day1-4 i.e in the output below 101,2,12MAR2018,15MAR2018
101,3,16MAR2018,18MAR2018 and the rest of the records for the id as it is. First variation was to split in day1-2 and day3-4 and the rest as it is and the code works. i was wondering how to do the second variation. Hope i am clear and sorry for the confusion

data want;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
  format std endt yymmddd10.;
datalines4;
101,2,12MAR2018,15MAR2018
101,3,16MAR2018,18MAR2018
101,3,19MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,15JAN2018
102,2,16JAN2018,18JAN2018
102,2,19JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
run;

  

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2082 views
  • 0 likes
  • 3 in conversation