Hi...I have a dataset that contains the records for the application for product coverage which is given for a limited time period. I would like to obtain the overall coverage for each ID by Product number. If the coverage would have been overlapping for the same product number for each ID, it would have been easy by grouping by ID and Product Number and using the minimum of the Effective Dates and maximumof the Expiry Dates. However, it is possible to have a gap between coverages and this is the part that I am having problems with. Any suggestions....Thanks.
Have:
ID | Product Number | Effective Date | Expiry Date |
2303 | 11232 | 20160102 | 20160227 |
2303 | 11232 | 20160212 | 20160325 |
2303 | 14356 | 20160212 | 20160325 |
2303 | 14356 | 20160325 | 20160420 |
2409 | 17809 | 20160113 | 20160217 |
2409 | 17809 | 20160220 | 20160326 |
2409 | 17809 | 20160301 | 20160320 |
2409 | 17809 | 20160324 | 20160430 |
2409 | 18488 | 20160119 | 20160219 |
2409 | 18488 | 20160218 | 20160318 |
2512 | 15932 | 20160123 | 20160227 |
2512 | 15932 | 20160227 | 20160329 |
2512 | 16554 | 20160124 | 20160213 |
2512 | 16554 | 20160214 | 20160315 |
2512 | 16554 | 20160314 | 20160417 |
2512 | 16554 | 20160420 | 20160521 |
2512 | 16554 | 20160519 |
20160620 |
Want:
ID | Product Number | Effective Date | Expiry Date |
2303 | 11232 | 20160102 | 20160325 |
2303 | 14356 | 20160212 | 20160420 |
2409 | 17809 | 20160113 | 20160217 |
2409 | 17809 | 20160220 | 20160430 |
2409 | 18488 | 20160119 | 20160318 |
2512 | 15932 | 20160123 | 20160329 |
2512 | 16554 | 20160124 | 20160213 |
2512 | 16554 | 20160214 | 20160417 |
2512 | 16554 | 20160420 | 20160620 |
This should do the trick:
proc sort data=prod;
by ID ProductNumber EffectiveDate descending ExpiryDate;
run;
data want;
do until(last.productNumber);
set prod; by id productNumber;
format effD expD yymmdd10.;
if expD < effectiveDate then
if not missing(expD) then do;
output;
call missing(effD, expD);
end;
effD = min(effD, effectiveDate);
expD = max(expD, expiryDate);
end;
output;
drop effectiveDate expiryDate;
rename effD=effectiveDate expD=expiryDate;
run;
proc print data=want noobs; run;
Like this? (edited: more robust on case of nested date intervals).
data HAVE;
input ID $ Product_Number $ Effective_Date 8. Expiry_Date : 8.;
cards;
2303 11232 20160102 20160227
2303 11232 20160212 20160325
2303 14356 20160212 20160325
2303 14356 20160325 20160420
2409 17809 20160113 20160217
2409 17809 20160220 20160326
2409 17809 20160301 20160320
2409 17809 20160324 20160430
2409 18488 20160119 20160219
2409 18488 20160218 20160318
2512 15932 20160123 20160227
2512 15932 20160227 20160329
2512 16554 20160124 20160213
2512 16554 20160214 20160315
2512 16554 20160314 20160417
2512 16554 20160420 20160521
2512 16554 20160519 20160620
2513 16555 20160314 20160417
2513 16555 20160415 20160416
2513 16555 20160522 20160620
run;
data WANT;
set HAVE (rename=(Effective_Date=D1 Expiry_Date=D2));
by ID Product_Number D1;
retain KEEP_D1 KEEP_D2 ;
if KEEP_D1=. then do; %* date interval not set;
KEEP_D1=D1; %* start new date interval;
KEEP_D2=D2;
end;
if D1 > KEEP_D2 > . then do; %* same product and gap between dates;
Effective_Date = KEEP_D1; %* save old dates to table;
Expiry_Date = KEEP_D2;
output;
KEEP_D1 = D1 ; %* start new date interval with new dates;
KEEP_D2 = D2 ;
end;
KEEP_D2=max(KEEP_D2,D2); %* update end date of interval if greater ;
if last.Product_Number then do; %* last record for product;
Effective_Date = KEEP_D1; %* save dates to table;
Expiry_Date = KEEP_D2;
output;
KEEP_D1=.; %* reset date interval;
end;
run;
There are many scenario you need to consider about .
data have;
infile datalines expandtabs truncover;
input ID ProductNumber (EffectiveDate ExpiryDate) (: yymmdd10.);
format EffectiveDate ExpiryDate yymmdd10.;
datalines;
2303 11232 20160102 20160227
2303 11232 20160212 20160325
2303 14356 20160212 20160325
2303 14356 20160325 20160420
2409 17809 20160113 20160217
2409 17809 20160220 20160326
2409 17809 20160301 20160320
2409 17809 20160324 20160430
2409 18488 20160119 20160219
2409 18488 20160218 20160318
2512 15932 20160123 20160227
2512 15932 20160227 20160329
2512 16554 20160124 20160213
2512 16554 20160214 20160315
2512 16554 20160314 20160417
2512 16554 20160420 20160521
2512 16554 20160519 20160620
;
run;
data temp;
set have;
n+1;
date=EffectiveDate;output;
date=ExpiryDate;output;
format date yymmdd10.;
drop EffectiveDate ExpiryDate;
run;
data temp;
set temp;
by ID ProductNumber n;
if first.ProductNumber or (first.n and date gt lag(date)+1) then group+1;
run;
data want;
do until(last.group);
set temp;
by group;
if first.group then EffectiveDate=date;
end;
ExpiryDate=date;
drop date n group;
format EffectiveDate ExpiryDate yymmdd10.;
run;
This should do the trick:
proc sort data=prod;
by ID ProductNumber EffectiveDate descending ExpiryDate;
run;
data want;
do until(last.productNumber);
set prod; by id productNumber;
format effD expD yymmdd10.;
if expD < effectiveDate then
if not missing(expD) then do;
output;
call missing(effD, expD);
end;
effD = min(effD, effectiveDate);
expD = max(expD, expiryDate);
end;
output;
drop effectiveDate expiryDate;
rename effD=effectiveDate expD=expiryDate;
run;
proc print data=want noobs; run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.