Hi...I have a dataset that contains the records for the application for product coverage which is given for a limited time period. I would like to obtain the overall coverage for each ID by Product number. If the coverage would have been overlapping for the same product number for each ID, it would have been easy by grouping by ID and Product Number and using the minimum of the Effective Dates and maximumof the Expiry Dates. However, it is possible to have a gap between coverages and this is the part that I am having problems with. Any suggestions....Thanks.
Have:
| ID | Product Number | Effective Date | Expiry Date | 
| 2303 | 11232 | 20160102 | 20160227 | 
| 2303 | 11232 | 20160212 | 20160325 | 
| 2303 | 14356 | 20160212 | 20160325 | 
| 2303 | 14356 | 20160325 | 20160420 | 
| 2409 | 17809 | 20160113 | 20160217 | 
| 2409 | 17809 | 20160220 | 20160326 | 
| 2409 | 17809 | 20160301 | 20160320 | 
| 2409 | 17809 | 20160324 | 20160430 | 
| 2409 | 18488 | 20160119 | 20160219 | 
| 2409 | 18488 | 20160218 | 20160318 | 
| 2512 | 15932 | 20160123 | 20160227 | 
| 2512 | 15932 | 20160227 | 20160329 | 
| 2512 | 16554 | 20160124 | 20160213 | 
| 2512 | 16554 | 20160214 | 20160315 | 
| 2512 | 16554 | 20160314 | 20160417 | 
| 2512 | 16554 | 20160420 | 20160521 | 
| 2512 | 16554 | 20160519 | 20160620 | 
Want:
| ID | Product Number | Effective Date | Expiry Date | 
| 2303 | 11232 | 20160102 | 20160325 | 
| 2303 | 14356 | 20160212 | 20160420 | 
| 2409 | 17809 | 20160113 | 20160217 | 
| 2409 | 17809 | 20160220 | 20160430 | 
| 2409 | 18488 | 20160119 | 20160318 | 
| 2512 | 15932 | 20160123 | 20160329 | 
| 2512 | 16554 | 20160124 | 20160213 | 
| 2512 | 16554 | 20160214 | 20160417 | 
| 2512 | 16554 | 20160420 | 20160620 | 
This should do the trick:
proc sort data=prod; 
by ID ProductNumber EffectiveDate descending ExpiryDate; 
run;
data want;
do until(last.productNumber);
    set prod; by id productNumber;
    format effD expD yymmdd10.;
    if expD < effectiveDate then
        if not missing(expD) then do;
             output;
             call missing(effD, expD);
             end;
    effD = min(effD, effectiveDate);
    expD = max(expD, expiryDate);
    end;
output;
drop effectiveDate expiryDate;
rename effD=effectiveDate expD=expiryDate;
run;
proc print data=want noobs; run;Like this? (edited: more robust on case of nested date intervals).
data HAVE;
 input ID $ Product_Number $ Effective_Date 8. Expiry_Date : 8.;
 cards;
2303 11232 20160102 20160227 
2303 11232 20160212 20160325 
2303 14356 20160212 20160325 
2303 14356 20160325 20160420 
2409 17809 20160113 20160217 
2409 17809 20160220 20160326 
2409 17809 20160301 20160320 
2409 17809 20160324 20160430 
2409 18488 20160119 20160219 
2409 18488 20160218 20160318 
2512 15932 20160123 20160227 
2512 15932 20160227 20160329 
2512 16554 20160124 20160213 
2512 16554 20160214 20160315 
2512 16554 20160314 20160417 
2512 16554 20160420 20160521 
2512 16554 20160519 20160620
2513 16555 20160314 20160417 
2513 16555 20160415 20160416 
2513 16555 20160522 20160620
run;
data WANT; 
  set HAVE (rename=(Effective_Date=D1 Expiry_Date=D2));
  by ID Product_Number D1;      
  retain KEEP_D1 KEEP_D2 ;
  if KEEP_D1=. then do;            %* date interval not set; 
    KEEP_D1=D1;                    %* start new date interval;
    KEEP_D2=D2;
  end;
  if D1 > KEEP_D2 > . then do;     %* same product and gap between dates;
    Effective_Date = KEEP_D1;      %* save old dates to table; 
    Expiry_Date    = KEEP_D2;      
    output;
    KEEP_D1 = D1 ;                 %* start new date interval with new dates;
    KEEP_D2 = D2 ;
  end;
  KEEP_D2=max(KEEP_D2,D2);         %* update end date of interval if greater ;
  if last.Product_Number then do;  %* last record for product;
    Effective_Date = KEEP_D1;      %* save dates to table; 
    Expiry_Date    = KEEP_D2;   
    output;
    KEEP_D1=.;                     %* reset date interval;
  end;
run; 
 
There are many scenario you need to consider about .
data have;
infile datalines expandtabs truncover;
input ID	ProductNumber	(EffectiveDate	ExpiryDate) (: yymmdd10.);
format EffectiveDate	ExpiryDate yymmdd10.;
datalines;
2303	11232	20160102 20160227 
2303	11232	20160212 20160325
2303	14356	20160212	20160325
2303	14356	20160325	20160420
2409	17809	20160113	20160217
2409	17809	20160220	20160326
2409	17809	20160301	20160320
2409	17809	20160324	20160430
2409	18488	20160119	20160219
2409	18488	20160218	20160318
2512	15932	20160123	20160227
2512	15932	20160227	20160329
2512	16554	20160124	20160213
2512	16554	20160214	20160315
2512	16554	20160314	20160417
2512	16554	20160420	20160521
2512	16554	20160519	20160620
;
run;
data temp;
 set have;
 n+1;
 date=EffectiveDate;output;
 date=ExpiryDate;output;
 format date yymmdd10.;
 drop EffectiveDate	ExpiryDate;
run;
data temp;
 set temp;
 by ID	ProductNumber n;
 if first.ProductNumber or (first.n and date gt lag(date)+1) then group+1;
run;
data want;
 do until(last.group);
  set temp;
  by group;
  if first.group then EffectiveDate=date;
 end;
 ExpiryDate=date;
 drop date n group;
 format EffectiveDate ExpiryDate yymmdd10.;
run;This should do the trick:
proc sort data=prod; 
by ID ProductNumber EffectiveDate descending ExpiryDate; 
run;
data want;
do until(last.productNumber);
    set prod; by id productNumber;
    format effD expD yymmdd10.;
    if expD < effectiveDate then
        if not missing(expD) then do;
             output;
             call missing(effD, expD);
             end;
    effD = min(effD, effectiveDate);
    expD = max(expD, expiryDate);
    end;
output;
drop effectiveDate expiryDate;
rename effD=effectiveDate expD=expiryDate;
run;
proc print data=want noobs; run;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
