how do i remove the row having the same data and also create new rows by splitting
for example:insurance policy data
Policyno. start_date end_date difference in dates
1 10mar2016 09mar2017 1year
1 10mar2016 09mar2017 1year
2 01apr2016 31jan2018 1year 10months
3 05may2016 04may2017 1 year
3 05may2017 04may2018 1 year
what i want:
1)remove the field having the same data for eg of policyno. 1.I want only one field with unique data.
2)the difference should be not more than one year,if so then split the data into 2 rows or 3 rows depending on the difference for eg
in policy no.2 difference is 1 year 10 months i want it to be split into 2 rows.
Policyno. start_date end_date difference in dates
2 01apr2016 31mar2017 1year
2 01apr2017 31jan2018 10 months(which is less than 1 year)
thankyou in advance
This is a good problem for using an OUTPUT statement inside a do loop, and also as an example of a do loop defined in this structure:
do x=1 by 1 while (some other condition).
Notes:
data want (drop=e_date n);
set have;
by policyno start_date end_date;
if first.end_date;
e_date=end_date;
do N=1 to 10 while (start_date < e_date);
end_date=min(e_date,intnx('year',start_date,1,'s')-1);
length date_dif $20;
date_dif='One Year';
if end_date<intnx('year',start_date,1,'s')-1 then
date_dif=catx(' ',intck('month',start_date,end_date+1,'continuous'),'months');
output;
start_date=end_date+1;
end;
run;
What if there were, say 3 years and 4 months between the two dates? 🙂
Would you then want 1 row saying 3years and 1 row saying 4months or 3 rows saying 1year and 1 saying 4 months?
And how do you define the 1 year between the two dates? Because if you use the YRDIF function which I would usually do in this situation, the difference between the 10mar2016 and 09mar2017 is barely 1 year. See below 🙂
data have;
input Policyno start_date :date9. end_date :date9.;
format start_date end_date date9.;
datalines;
1 10mar2016 09mar2017
1 10mar2016 09mar2017
2 01apr2016 31jan2018
3 05may2016 04may2017
3 05may2017 04may2018
;
data test;
set have;
difference_in_years = yrdif(start_date, end_date);
run;
proc print data = test;
run;
For 1) PROC SORT w NODUPKEY should do the trick.
For 2), use IF to test the values of your start/end variables, and conditionally do explicit OUTPUT, and assign new values to start/end.
THANKYOU FOR YOUR TIME.
I SOLVED THE FIRST PROBLEM BUT COULD YOU HELP ME WITH THE SECOND BY TAKING POLICYNO. 2 AS EXAMPLE.
HOW TO SPLIT IT INTO DUPLICATE ENTRIES WITH DIFFERENT START DATE WITH MAXIMUM GAP OF ONE YEAR
@varun11494 how do you define a years difference? 🙂
thankyou for your time
364 or 365 days
suppose policy start date is 03mar2015 so its end date should be 02mar2016 and the remaining days should be in next entries following the same pattern.
so what i have started is i found out the difference in dates and approximated it to next integer value so if difference is 1.2 or 1.5 or 1.8 years it will show 2 years as difference.from there i want to create duplicate number of rows row matching the difference(approximated to next integer).what i want is just like in this case the difference is 2 so 2 enteries shall create for it.
if the difference would be 3 so it shall automatically create 3 duplicate entries with shift in dates.
data have; input Policyno (start_date end_date) (: date9.) ; format start_date end_date date9.; cards; 1 10mar2016 09mar2017 1year 1 10mar2016 09mar2017 1year 2 01apr2016 31jan2018 1year 10months 3 05may2016 04may2017 1 year 3 05may2017 04may2018 ; run; data temp; set have; by policyno start_date end_date; if first.end_date; run; data want; set temp; yr=yrdif(start_date,end_date); _date1=start_date;_date2=end_date; do i=1 to int(yr); start_date=_date1 ; end_date=intnx('year',_date1,1,'s')-1; output; _date1=end_date+1; end; if mod(yr,1) ne 0 then do; start_date=_date1;end_date=_date2; output; end; drop _: i yr; run;
This is a good problem for using an OUTPUT statement inside a do loop, and also as an example of a do loop defined in this structure:
do x=1 by 1 while (some other condition).
Notes:
data want (drop=e_date n);
set have;
by policyno start_date end_date;
if first.end_date;
e_date=end_date;
do N=1 to 10 while (start_date < e_date);
end_date=min(e_date,intnx('year',start_date,1,'s')-1);
length date_dif $20;
date_dif='One Year';
if end_date<intnx('year',start_date,1,'s')-1 then
date_dif=catx(' ',intck('month',start_date,end_date+1,'continuous'),'months');
output;
start_date=end_date+1;
end;
run;
Hi
your coding worked
Can you please explain me step by step how his worked.It would be helpful for me to understand
datalines..
s.no policy no start_date end_date
1 32 04apr2003 03apr2004
2 32 04apr2003 03apr2004
Thankyou for your time
The begining of the program has
set have;
by policyno start_date end_date;
if first.end_date;
The BY statement tells SAS to (1) expect the data to be pre-sorted by policyno start_date end_date, and (2) set automatic dummies for each of the by variables indicaiing whether the record in hand is the first record for a given by value (first.policyno, first.start_date, first.end_date) or the last (last.policyno etc.). And what happens to the first. dummies when policyno changes but the dates don't? The treatment of by values is hierarchical, so if a given by var changes setting its first. dummy, all lower order (to the right) first. dummies are also set - no matter what their value sequence is.
The third statement is a subsetting IF (look up sas "subsetting if") which tells sas to keep only records which are the first instance of a given policyno/start_date/end_date combination. As a result the DO group never processes duplicates.
If you want an explanation of how the DO group works, take a multiyear policy, run the do group. But inside the do group place a set of PUT statements to see what is happening to N, start_date, end_date, and date_dif:
e_date=end_date;
put 'A: ' e_date=yymmddn8.;
do N=1 to 10 while (start_date < e_date);
put / 'B: ' N= start_date= end_date=;
end_date=min(e_date,intnx('year',start_date,1,'s')-1);
put 'C: ' end_date=;
length date_dif $20;
date_dif='One Year';
if end_date<intnx('year',start_date,1,'s')-1 then
date_dif=catx(' ',intck('month',start_date,end_date+1,'continuous'),'months');
put 'D: ' date_dif=;
output;
start_date=end_date+1;
put 'E: ' start_date=;
end;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.