problems with splitting first id with 2 days as duration for 2 days

vraj1 · Posted 10-17-2018 09:19 AM

I use the below code to split the first id for first 2 observations. My ocde works for id 102 but not for id=101 as the first observation is only 2 days and the next one hsould split for 2 and the rest as it is. like below. any suggestion where i went wrong?

data WORK.want;
infile datalines dsd truncover;
input id:32. num:32. std:DATE9. endt:DATE9.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018
101,3,16MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,13JAN2018
102,2,14JAN2018,15JAN2018
102,2,16JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;

data WORK.TEST;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;

data a_0;
 format std endt date9.;
      set TEST;
      diff=ENDT-STD;
      if diff>=3 then tmp=1;
      else tmp=2;
      diff1=int(diff/2);
run;

proc sort data=a_0;
      by id tmp num;
run;

data a_05;
      format std endt date9.;
      set a_0;
      by id tmp num;
      retain end1;
      format end1 date9.;
      if first.id then end1=.;
      if first.id then do;
            do i=1 to diff1;
            if i=1 then do; STD=STD; end1=STD+1; output; end;
            else if i < diff1 then do;
                  STD=end1+1; end1=STD+1; output; end;
            else do; STD=end1+1; end1=ENDT; output; end;
            end;
      end;
      else do; end1=ENDT; output ; end;
run;

Also i have another instance where i need to break the id from day 1to4 and next as 5 to 7. Can i use same logic somehow?

Kurt_Bremser · Posted 10-17-2018 09:47 AM

Why is 101,3 (which covers 9 days) split into just 2 observations, but 102,2 (which covers only 7 days) into 3?

Can you give us a plain-text rule for the splitting?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

vraj1 · Posted 10-17-2018 09:52 AM

The rule is i need to split the id as day1-2, day3-4, and if the id has same num the rest of days and the remaining as it is

another program in which scenario is to split by day1-4, day5-7 and remainaing as it is.

in my code which i used it works only if the first duration is more than 6 and does not work if the first record for id has only 2 day duration and the next one with different num.

101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018
101,3,16MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018

Kurt_Bremser · Posted 10-17-2018 09:55 AM

That does in no way explain the difference between 101,2 and 102,2. What is the difference between those two?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

vraj1 · Posted 10-17-2018 10:12 AM

For id 101 the first record is starts at 12MAR2018 and end date 13MAR2018 and the next record 14MAR2018 to 23MAR2018 and i expect first scenario result as

101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018
101,3,16MAR2018,23MAR2018

101,4,24MAR2018,29MAR2018

but the code which comes from my code is

101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018

101,3,16MAR2018,17MAR2018

101,3,18MAR2018,19MAR2018

But for id=102 since i have duration of the first record more than 2 i.e 12JAN2018,19JAN2018 it splits properly.

Hope i explained it clearly.

Kurt_Bremser · Posted 10-17-2018 10:32 AM

So you need to run a counter. The first two observations for a given id, even with different num, should only cover 2 days each. From then on, any remaining rest is output and all subsequent observations remain as they are.

At least this code will produce your want dataset:

data have;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
  format std endt yymmddd10.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
run;

data want;
set have (rename=(std=_std endt=_endt));
by id;
retain obscount;
if first.id
then obscount = 0;
format std endt yymmddd10.;
std = _std;
endt = std + 1;
do while (obscount < 2 and endt < _endt);
  output;
  std = std + 2;
  endt = min(endt + 2,_endt);
  obscount + 1;
end;
if endt < _endt or std = _std
then do;
  endt = _endt;
  output;
  obscount + 1;
end;
drop obscount _std _endt;
run;

proc print data=want noobs;
run;

Result:

 id    num           std          endt

101     2     2018-03-12    2018-03-13
101     3     2018-03-14    2018-03-15
101     3     2018-03-16    2018-03-23
101     4     2018-03-24    2018-03-29
102     2     2018-01-12    2018-01-13
102     2     2018-01-14    2018-01-15
102     2     2018-01-16    2018-01-19
102     3     2018-01-20    2018-01-28
102     4     2018-01-29    2018-02-12
102     5     2018-02-13    2018-02-18
102     6     2018-02-19    2018-02-28

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

vraj1 · Posted 10-17-2018 10:42 AM

Thanks a lot. it works.

There is another scenario where i should split the first 2 records on day1-4 and day 5-7.

Can i somehow use the same program with changes?

mkeintz · Posted 10-17-2018 10:35 AM

Then what do you do when the first record is only 1 day. I notice that your output file identifies a single record number source for each of your resulting periods. But if you have single day records, that constructing 2-day intervals at the start of each id, requires multiple source record numbers.

And I have the same question if the first record is a 3-day span.

Please give output example for the following, with output variables NUM, STD, and ENDT. Or if these are not possible input conditions, let us know.

first record has 1 day, 2nd has 3 or more

first and second record each have 2 days

first record has 3 days.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

vraj1 · Posted 10-17-2018 11:17 AM

the second scenario i need is to split day1-4 and day5-7 from the first id

data have;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
  format std endt yymmddd10.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
run;

data have;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
  format std endt yymmddd10.;
datalines4;
101,2,12MAR2018,15MAR2018
101,3,16MAR2018,18MAR2018
101,3,19MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,15JAN2018
102,2,16JAN2018,18JAN2018
102,2,19JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
run;

mkeintz · Posted 10-17-2018 11:30 AM

How can we advise you on the 2nd variation of the problem when we don't yet know whether all the rules for the first variation are known? That's why I posted my questions earlier (and why @Kurt_Bremser also posted questions).

In particular, you have not said what you want the output to look like if a single incoming record is not long enough to satisfy your output date range requirements at the beginning of each id. If we don't know the answer for 2 followed by 2, then satisfying the 4 followed by 3 is an exercise in futility.

Help us help you. If you don't understand why a question is being asked, let us know. Otherwise ignoring the question requires us to be mind readers, a skill I'd rather leave unused.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

vraj1 · Posted 10-18-2018 03:11 AM

Sorry for not being clear.

The first variation worked as per Kurt's code and the second one i need to split in similar way as variation 1 but split day1-4 and day5-7 of the first id

if the data is like below

data have;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
  format std endt yymmddd10.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
run;

i neeed it to split like below which is irrespective of the first id duration it should split on day1-4 i.e in the output below 101,2,12MAR2018,15MAR2018
101,3,16MAR2018,18MAR2018 and the rest of the records for the id as it is. First variation was to split in day1-2 and day3-4 and the rest as it is and the code works. i was wondering how to do the second variation. Hope i am clear and sorry for the confusion

data want;
  infile datalines dsd truncover;
  input id:32. num:32. std:DATE9. endt:DATE9.;
  format std endt yymmddd10.;
datalines4;
101,2,12MAR2018,15MAR2018
101,3,16MAR2018,18MAR2018
101,3,19MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,15JAN2018
102,2,16JAN2018,18JAN2018
102,2,19JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
run;

problems with splitting first id with 2 days as duration for 2 days

Re: problems with splitting first id with 2 days as duration for 2 days

Re: problems with splitting first id with 2 days as duration for 2 days

Re: problems with splitting first id with 2 days as duration for 2 days

Re: problems with splitting first id with 2 days as duration for 2 days

Re: problems with splitting first id with 2 days as duration for 2 days

Re: problems with splitting first id with 2 days as duration for 2 days

Re: problems with splitting first id with 2 days as duration for 2 days

Re: problems with splitting first id with 2 days as duration for 2 days

Re: problems with splitting first id with 2 days as duration for 2 days

Re: problems with splitting first id with 2 days as duration for 2 days

SAS Innovate 2025: Call for Content

Classroom Training Available!