BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
varun11494
Fluorite | Level 6

how do i remove the row having the same data and also create new rows by splitting
for example:insurance policy data
Policyno.   start_date      end_date       difference in dates
     1           10mar2016   09mar2017         1year

     1           10mar2016   09mar2017          1year

     2           01apr2016   31jan2018             1year 10months

     3           05may2016  04may2017          1 year

     3           05may2017  04may2018          1 year

what i want:

1)remove the field having the same data for eg of policyno. 1.I want only one field with unique data.
2)the difference should be not more than one year,if so then split the data into 2 rows or 3 rows depending on the difference for eg
   in policy no.2 difference is 1 year 10 months i want it to be split into 2 rows.
  Policyno.   start_date      end_date       difference in dates  
     2           01apr2016   31mar2017         1year

     2           01apr2017  31jan2018           10 months(which is less than 1 year)

 

thankyou in advance

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This is a good problem for using an OUTPUT statement inside a do loop, and also as an example of a do loop defined in this structure:

 

   do x=1 by 1 while (some other condition).

 

Notes:

  1. For multiyear policies, the program below constantly adjusts START_DATE and END_DATE until START_DATE exceeds the original end_date (e_date in the program).
  2. This program assumes no policy exceed 10 years ("N=1 to 10").
  3. The intck function uses the "continuous" method argument, which means it counts date "boudaries" based on the first date argument, not on the normal calendar month boundary, nor on a user-specified "boundary-shift" parameter.  Great for getting anniveraries.
  4. The CATX function concatenates the 2nd, 3rd, etc arguments, and uses the first argument as a separator.

 

data want (drop=e_date n);
  set have;
  by policyno start_date end_date;
  if first.end_date;

  e_date=end_date;
  do N=1 to 10 while (start_date < e_date);
    end_date=min(e_date,intnx('year',start_date,1,'s')-1);
    length date_dif $20;
    date_dif='One Year';
    if end_date<intnx('year',start_date,1,'s')-1 then 
      date_dif=catx(' ',intck('month',start_date,end_date+1,'continuous'),'months');
    output;
    start_date=end_date+1;
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

11 REPLIES 11
PeterClemmensen
Tourmaline | Level 20

What if there were, say 3 years and 4 months between the two dates? 🙂

 

Would you then want 1 row saying 3years and 1 row saying 4months or 3 rows saying 1year and 1 saying 4 months?

varun11494
Fluorite | Level 6
4 entries
With 3 entries of 1 year each and the last entry of the remaining 4 months

Basically the difference of dates should not be greater than 1 year and at
the same time should be unique. Like in this case if policy starts on
1apr2010 to 31july13 i.e 3 years and 4 months. I Want it to be shown like
Policyno. Start date end date diff
X 01apr2010 31mar2011 1year
X 01apr2011 31mar2012 1year
X 01apr2012 31mar2013 1year
X 01apr2013 31july2013 4m

##- Please type your reply above this line. Simple formatting, no
attachments. -##
PeterClemmensen
Tourmaline | Level 20

And how do you define the 1 year between the two dates? Because if you use the YRDIF function which I would usually do in this situation, the difference between the 10mar2016 and 09mar2017 is barely 1 year. See below 🙂

 

data have;
input Policyno start_date :date9. end_date :date9.;
format start_date end_date date9.;
datalines;
1 10mar2016 09mar2017
1 10mar2016 09mar2017
2 01apr2016 31jan2018
3 05may2016 04may2017
3 05may2017 04may2018
;

data test;
   set have;
   difference_in_years = yrdif(start_date, end_date);
run;

proc print data = test;
run;
LinusH
Tourmaline | Level 20

For 1) PROC SORT w NODUPKEY should do the trick.

For 2), use IF to test the values of your start/end variables, and conditionally do explicit OUTPUT, and assign new values to start/end.

Data never sleeps
varun11494
Fluorite | Level 6

THANKYOU FOR YOUR TIME.
I SOLVED THE FIRST PROBLEM BUT COULD YOU HELP ME WITH THE SECOND BY TAKING POLICYNO. 2 AS EXAMPLE.
HOW TO SPLIT IT INTO DUPLICATE ENTRIES WITH DIFFERENT START DATE WITH MAXIMUM GAP OF ONE YEAR

PeterClemmensen
Tourmaline | Level 20

@varun11494 how do you define a years difference? 🙂

varun11494
Fluorite | Level 6

thankyou for your time
364 or 365 days
suppose policy start date is 03mar2015 so its end date should be 02mar2016 and the remaining days should be in next entries following the same pattern.
so what i have started is i found out the difference in dates and approximated it to next integer value so if difference is 1.2 or 1.5 or 1.8 years it will show 2 years as difference.from there i want to create duplicate number of rows row matching the difference(approximated to next integer).what i want is just like in this case the difference is 2 so 2 enteries shall create for it.
if the difference would be 3 so it shall automatically create 3 duplicate entries with shift in dates.

 

Ksharp
Super User

data have;
input Policyno   (start_date      end_date) (: date9.) ;
format start_date end_date date9.;
cards;
     1           10mar2016   09mar2017         1year
     1           10mar2016   09mar2017          1year
     2           01apr2016   31jan2018             1year 10months
     3           05may2016  04may2017          1 year
     3           05may2017  04may2018   
;
run;
data temp;
 set have;
 by policyno start_date end_date;
 if first.end_date;
run;
data want;
 set temp;
 yr=yrdif(start_date,end_date);
 _date1=start_date;_date2=end_date;
 do i=1 to int(yr);
  start_date=_date1 ; end_date=intnx('year',_date1,1,'s')-1;
  output;
  _date1=end_date+1;

 end;
 if mod(yr,1) ne 0 then do;
  start_date=_date1;end_date=_date2;
  output;
 end;
 drop _: i yr;
run;


mkeintz
PROC Star

This is a good problem for using an OUTPUT statement inside a do loop, and also as an example of a do loop defined in this structure:

 

   do x=1 by 1 while (some other condition).

 

Notes:

  1. For multiyear policies, the program below constantly adjusts START_DATE and END_DATE until START_DATE exceeds the original end_date (e_date in the program).
  2. This program assumes no policy exceed 10 years ("N=1 to 10").
  3. The intck function uses the "continuous" method argument, which means it counts date "boudaries" based on the first date argument, not on the normal calendar month boundary, nor on a user-specified "boundary-shift" parameter.  Great for getting anniveraries.
  4. The CATX function concatenates the 2nd, 3rd, etc arguments, and uses the first argument as a separator.

 

data want (drop=e_date n);
  set have;
  by policyno start_date end_date;
  if first.end_date;

  e_date=end_date;
  do N=1 to 10 while (start_date < e_date);
    end_date=min(e_date,intnx('year',start_date,1,'s')-1);
    length date_dif $20;
    date_dif='One Year';
    if end_date<intnx('year',start_date,1,'s')-1 then 
      date_dif=catx(' ',intck('month',start_date,end_date+1,'continuous'),'months');
    output;
    start_date=end_date+1;
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
varun11494
Fluorite | Level 6

Hi
your coding worked

Can you please explain me step by step how his worked.It would be helpful for me to understand
datalines..

  s.no policy no      start_date        end_date

    1       32           04apr2003      03apr2004 

    2       32           04apr2003      03apr2004

 

Thankyou for your time

mkeintz
PROC Star

The begining of the program has

 

   set have;

   by policyno start_date end_date;

   if first.end_date;

 

The BY statement tells SAS to (1) expect the data to be pre-sorted by policyno start_date end_date, and (2) set automatic dummies for each of the by variables indicaiing whether the record in hand is the first record for a given by value (first.policyno, first.start_date, first.end_date) or the last (last.policyno etc.).    And what happens to the first. dummies when policyno changes but the dates don't? The treatment of by values is hierarchical, so if a given by var changes setting its first. dummy, all lower order (to the right) first. dummies  are also set - no matter what their value sequence is.

 

The third statement is a subsetting IF (look up sas "subsetting if") which tells sas to keep only records which are the first instance of a given policyno/start_date/end_date combination.  As a result the DO group never processes duplicates.

 

If you want an explanation of how the DO group works, take a multiyear policy, run the do group. But inside the do group place a set of PUT statements to see what is happening to N, start_date, end_date, and date_dif:

 

 

e_date=end_date;

put 'A: ' e_date=yymmddn8.;

  do N=1 to 10 while (start_date < e_date);

    put / 'B: ' N= start_date= end_date=;
    end_date=min(e_date,intnx('year',start_date,1,'s')-1);

    put  'C: '  end_date=;
    length date_dif $20;
    date_dif='One Year';
    if end_date<intnx('year',start_date,1,'s')-1 then
      date_dif=catx(' ',intck('month',start_date,end_date+1,'continuous'),'months');

      put 'D: ' date_dif=;
    output;
    start_date=end_date+1;

    put 'E: ' start_date=;
  end;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2465 views
  • 2 likes
  • 5 in conversation