I use the below code to split the first id for first 2 observations. My ocde works for id 102 but not for id=101 as the first observation is only 2 days and the next one hsould split for 2 and the rest as it is. like below. any suggestion where i went wrong?
data WORK.want;
infile datalines dsd truncover;
input id:32. num:32. std:DATE9. endt:DATE9.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018
101,3,16MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,13JAN2018
102,2,14JAN2018,15JAN2018
102,2,16JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
data WORK.TEST; infile datalines dsd truncover; input id:32. num:32. std:DATE9. endt:DATE9.; datalines4; 101,2,12MAR2018,13MAR2018 101,3,14MAR2018,23MAR2018 101,4,24MAR2018,29MAR2018 102,2,12JAN2018,19JAN2018 102,3,20JAN2018,28JAN2018 102,4,29JAN2018,12FEB2018 102,5,13FEB2018,18FEB2018 102,6,19FEB2018,28FEB2018 ;;;; data a_0; format std endt date9.; set TEST; diff=ENDT-STD; if diff>=3 then tmp=1; else tmp=2; diff1=int(diff/2); run; proc sort data=a_0; by id tmp num; run; data a_05; format std endt date9.; set a_0; by id tmp num; retain end1; format end1 date9.; if first.id then end1=.; if first.id then do; do i=1 to diff1; if i=1 then do; STD=STD; end1=STD+1; output; end; else if i < diff1 then do; STD=end1+1; end1=STD+1; output; end; else do; STD=end1+1; end1=ENDT; output; end; end; end; else do; end1=ENDT; output ; end; run;
Also i have another instance where i need to break the id from day 1to4 and next as 5 to 7. Can i use same logic somehow?
Why is 101,3 (which covers 9 days) split into just 2 observations, but 102,2 (which covers only 7 days) into 3?
Can you give us a plain-text rule for the splitting?
The rule is i need to split the id as day1-2, day3-4, and if the id has same num the rest of days and the remaining as it is
another program in which scenario is to split by day1-4, day5-7 and remainaing as it is.
in my code which i used it works only if the first duration is more than 6 and does not work if the first record for id has only 2 day duration and the next one with different num.
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018
101,3,16MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
That does in no way explain the difference between 101,2 and 102,2. What is the difference between those two?
For id 101 the first record is starts at 12MAR2018 and end date 13MAR2018 and the next record 14MAR2018 to 23MAR2018 and i expect first scenario result as
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018
101,3,16MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
but the code which comes from my code is
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,15MAR2018
101,3,16MAR2018,17MAR2018
101,3,18MAR2018,19MAR2018
But for id=102 since i have duration of the first record more than 2 i.e 12JAN2018,19JAN2018 it splits properly.
Hope i explained it clearly.
So you need to run a counter. The first two observations for a given id, even with different num, should only cover 2 days each. From then on, any remaining rest is output and all subsequent observations remain as they are.
At least this code will produce your want dataset:
data have;
infile datalines dsd truncover;
input id:32. num:32. std:DATE9. endt:DATE9.;
format std endt yymmddd10.;
datalines4;
101,2,12MAR2018,13MAR2018
101,3,14MAR2018,23MAR2018
101,4,24MAR2018,29MAR2018
102,2,12JAN2018,19JAN2018
102,3,20JAN2018,28JAN2018
102,4,29JAN2018,12FEB2018
102,5,13FEB2018,18FEB2018
102,6,19FEB2018,28FEB2018
;;;;
run;
data want;
set have (rename=(std=_std endt=_endt));
by id;
retain obscount;
if first.id
then obscount = 0;
format std endt yymmddd10.;
std = _std;
endt = std + 1;
do while (obscount < 2 and endt < _endt);
output;
std = std + 2;
endt = min(endt + 2,_endt);
obscount + 1;
end;
if endt < _endt or std = _std
then do;
endt = _endt;
output;
obscount + 1;
end;
drop obscount _std _endt;
run;
proc print data=want noobs;
run;
Result:
id num std endt 101 2 2018-03-12 2018-03-13 101 3 2018-03-14 2018-03-15 101 3 2018-03-16 2018-03-23 101 4 2018-03-24 2018-03-29 102 2 2018-01-12 2018-01-13 102 2 2018-01-14 2018-01-15 102 2 2018-01-16 2018-01-19 102 3 2018-01-20 2018-01-28 102 4 2018-01-29 2018-02-12 102 5 2018-02-13 2018-02-18 102 6 2018-02-19 2018-02-28
Thanks a lot. it works.
There is another scenario where i should split the first 2 records on day1-4 and day 5-7.
Can i somehow use the same program with changes?
Then what do you do when the first record is only 1 day. I notice that your output file identifies a single record number source for each of your resulting periods. But if you have single day records, that constructing 2-day intervals at the start of each id, requires multiple source record numbers.
And I have the same question if the first record is a 3-day span.
Please give output example for the following, with output variables NUM, STD, and ENDT. Or if these are not possible input conditions, let us know.
first record has 1 day, 2nd has 3 or more
first and second record each have 2 days
first record has 3 days.
the second scenario i need is to split day1-4 and day5-7 from the first id
data have; infile datalines dsd truncover; input id:32. num:32. std:DATE9. endt:DATE9.; format std endt yymmddd10.; datalines4; 101,2,12MAR2018,13MAR2018 101,3,14MAR2018,23MAR2018 101,4,24MAR2018,29MAR2018 102,2,12JAN2018,19JAN2018 102,3,20JAN2018,28JAN2018 102,4,29JAN2018,12FEB2018 102,5,13FEB2018,18FEB2018 102,6,19FEB2018,28FEB2018 ;;;; run; data have; infile datalines dsd truncover; input id:32. num:32. std:DATE9. endt:DATE9.; format std endt yymmddd10.; datalines4; 101,2,12MAR2018,15MAR2018 101,3,16MAR2018,18MAR2018 101,3,19MAR2018,23MAR2018 101,4,24MAR2018,29MAR2018 102,2,12JAN2018,15JAN2018 102,2,16JAN2018,18JAN2018 102,2,19JAN2018,19JAN2018 102,3,20JAN2018,28JAN2018 102,4,29JAN2018,12FEB2018 102,5,13FEB2018,18FEB2018 102,6,19FEB2018,28FEB2018 ;;;; run;
How can we advise you on the 2nd variation of the problem when we don't yet know whether all the rules for the first variation are known? That's why I posted my questions earlier (and why @Kurt_Bremser also posted questions).
In particular, you have not said what you want the output to look like if a single incoming record is not long enough to satisfy your output date range requirements at the beginning of each id. If we don't know the answer for 2 followed by 2, then satisfying the 4 followed by 3 is an exercise in futility.
Help us help you. If you don't understand why a question is being asked, let us know. Otherwise ignoring the question requires us to be mind readers, a skill I'd rather leave unused.
Sorry for not being clear.
The first variation worked as per Kurt's code and the second one i need to split in similar way as variation 1 but split day1-4 and day5-7 of the first id
if the data is like below
data have; infile datalines dsd truncover; input id:32. num:32. std:DATE9. endt:DATE9.; format std endt yymmddd10.; datalines4; 101,2,12MAR2018,13MAR2018 101,3,14MAR2018,23MAR2018 101,4,24MAR2018,29MAR2018 102,2,12JAN2018,19JAN2018 102,3,20JAN2018,28JAN2018 102,4,29JAN2018,12FEB2018 102,5,13FEB2018,18FEB2018 102,6,19FEB2018,28FEB2018 ;;;; run;
i neeed it to split like below which is irrespective of the first id duration it should split on day1-4 i.e in the output below 101,2,12MAR2018,15MAR2018
101,3,16MAR2018,18MAR2018 and the rest of the records for the id as it is. First variation was to split in day1-2 and day3-4 and the rest as it is and the code works. i was wondering how to do the second variation. Hope i am clear and sorry for the confusion
data want; infile datalines dsd truncover; input id:32. num:32. std:DATE9. endt:DATE9.; format std endt yymmddd10.; datalines4; 101,2,12MAR2018,15MAR2018 101,3,16MAR2018,18MAR2018 101,3,19MAR2018,23MAR2018 101,4,24MAR2018,29MAR2018 102,2,12JAN2018,15JAN2018 102,2,16JAN2018,18JAN2018 102,2,19JAN2018,19JAN2018 102,3,20JAN2018,28JAN2018 102,4,29JAN2018,12FEB2018 102,5,13FEB2018,18FEB2018 102,6,19FEB2018,28FEB2018 ;;;; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.