Solved: Fill In Missing Dates for Each Possible Combination

acemanhattan · Posted 08-14-2017 10:43 AM

I have a set of data which is orgainized like this

State	LOB1	LOB2	Dt1	Dt2	Value1	Value2
.	.	.	.	.	.	.
.	.	.	.	.	.	.
ID	Biz1	Biz1A	201401	201301	127	1415
ID	Biz1	Biz1A	201401	201305	654	4673
ID	Biz1	Biz1A	201401	201307	98	299
ID	Biz1	Biz1A	201401	201309	123	30
ID	Biz1	Biz1A	201401	201310	145	1152
ID	Biz1	Biz1A	201401	201312	125	24
ID	Biz1	Biz1A	201401	201401	6787	19534
ID	Biz1	Biz1A	201402	201301	43	381
ID	Biz1	Biz1A	201402	201302	11	4
ID	Biz1	Biz1A	201402	201307	124	285
ID	Biz1	Biz1A	201402	201311	45	357
ID	Biz1	Biz1A	201402	201312	3	1
.	.	.	.	.	.	.
.	.	.	.	.	.	.

Where there are 4 choices of state (shown here is ID), LOB1 comes from 20 different values, and LOB2 comes from 10 diifferent values, Dt1 ranges from 201301-201707, and Dt2 ranges from 201301-201707 (but Dt2 <= Dt1). I basically need to fill in the data such that there are no missing date combinations for any of the combinations of State, LOB1, LOB2 that exist in my data.

The data above would look like this, where the "x" denote observations that have been added:

	State	LOB1	LOB2	Dt1	Dt2	Value1	Value2
	.	.	.	.	.	.	.
	.	.	.	.	.	.	.
	ID	Biz1	Biz1A	201401	201301	127	1415
x	ID	Biz1	Biz1A	201401	201302		.
x	ID	Biz1	Biz1A	201401	201303		.
x	ID	Biz1	Biz1A	201401	201304		.
	ID	Biz1	Biz1A	201401	201305	654	4673
x	ID	Biz1	Biz1A	201401	201306		.
	ID	Biz1	Biz1A	201401	201307	98	299
x	ID	Biz1	Biz1A	201401	201308		.
	ID	Biz1	Biz1A	201401	201309	123	30
	ID	Biz1	Biz1A	201401	201310	145	1152
x	ID	Biz1	Biz1A	201401	201311		.
	ID	Biz1	Biz1A	201401	201312	125	24
	ID	Biz1	Biz1A	201401	201401	6787	19534
	ID	Biz1	Biz1A	201402	201301	43	381
	ID	Biz1	Biz1A	201402	201302	11	4
x	ID	Biz1	Biz1A	201402	201303		.
x	ID	Biz1	Biz1A	201402	201304		.
x	ID	Biz1	Biz1A	201402	201305		.
x	ID	Biz1	Biz1A	201402	201306		.
	ID	Biz1	Biz1A	201402	201307	124	285
x	ID	Biz1	Biz1A	201402	201308		.
x	ID	Biz1	Biz1A	201402	201309		.
x	ID	Biz1	Biz1A	201402	201310		.
	ID	Biz1	Biz1A	201402	201311	45	357
	ID	Biz1	Biz1A	201402	201312	3	1
	.	.	.	.	.	.	.
	.	.	.	.	.	.	.

Thanks in advance.

Ksharp · Posted 08-15-2017 10:40 AM

data have;
input (State LOB1 LOB2) ($) xdt1 xdt2 Value1 Value2;
dt1 = mdy(mod(xdt1,100),1,int(xdt1/100));
dt2 = mdy(mod(xdt2,100),1,int(xdt2/100));
format dt1 dt2 yymmn6.;
drop x:;
datalines; 
ID Biz1 Biz1A 201401 201301 127 1415 
ID Biz1 Biz1A 201401 201305 654 4673 
ID Biz1 Biz1A 201401 201307 98 299 
ID Biz1 Biz1A 201401 201309 123 30 
ID Biz1 Biz1A 201401 201310 145 1152 
ID Biz1 Biz1A 201401 201312 125 24 
ID Biz1 Biz1A 201401 201401 6787 19534 
ID Biz1 Biz1A 201402 201301 43 381 
ID Biz1 Biz1A 201402 201302 11 4 
ID Biz1 Biz1A 201402 201307 124 285 
ID Biz1 Biz1A 201402 201311 45 357 
ID Biz1 Biz1A 201402 201312 3 1 
;

data want;
 merge have have(firstobs=2 keep=State dt1 dt2 rename=(State=_State dt1=_dt1 dt2=_dt2));
 output;
 if State=_State and dt1=_dt1 then do;
  do i=1 to intck('month',dt2,_dt2)-1;
   dt2=intnx('month',dt2,1);
   value1=.;value2=.;
   output;
 end;
end;
drop i _:;
run;

View solution in original post

Reeza · Posted 08-14-2017 11:13 AM

PROC TIMESERIES - if you dates are SAS dates and you have SAS/ETS licensed.

https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52

acemanhattan · Posted 08-14-2017 02:08 PM

It looks like I'll want to switch the format of my YYYYMM date values, yes? To something that SAS processes as a date?

tomrvincent · Posted 08-14-2017 01:28 PM

write out the two months to 2 datasets and then join them without a join statement. That'll give you the cartesian product.

acemanhattan · Posted 08-14-2017 01:52 PM

@tomrvincent

Ignoring for a second that I don't necessarily know how to do that (though I understand what you're proposing), will the Cartesian product give me the combinations of STATE, LOB1, LOB2 I need?

I was thinking of a solution similar to this, where I generate the range of Month2 across the range of Month1, and then join to my dataset, but I couldn't figure out how I'd get the other necessary values to be anything other than blank.

tomrvincent · Posted 08-14-2017 02:17 PM

no reason why it can't. I just did it by dt and value with no joins at all...got 26 records.

here's what I did (joining by state is all I tried):

/* get distinct '1' values */

PROC SQL;
   CREATE TABLE WORK.q1 AS
   SELECT DISTINCT t1.State,
          t1.LOB1,
          t1.Value1,
          t1.Dt1
      FROM WORK.BOOK1 t1;

/* get distinct '2' values */

CREATE TABLE WORK.q2 AS
   SELECT DISTINCT t1.State,
          t1.LOB2,
          t1.Value2,
          t1.Dt2
      FROM WORK.BOOK1 t1;
/* join together by state only */

   CREATE TABLE WORK.q3 AS
   SELECT t1.State,
          t1.LOB1,
          t1.Value1,
          t1.Dt1,
          t2.State AS State1,
          t2.LOB2,
          t2.Value2,
          t2.Dt2
      FROM WORK.Q1 t1
           INNER JOIN WORK.Q2 t2 ON (t1.State = t2.State);
QUIT;

results in 294 records.

PGStats · Posted 08-15-2017 12:23 AM

Looks like all you need is to fill in the missing dt2 months (with value1 and value2 set to missing for the added dates)

This would do it:

data have;
input (State LOB1 LOB2) ($) xdt1 xdt2 Value1 Value2;
dt1 = mdy(mod(xdt1,100),1,int(xdt1/100));
dt2 = mdy(mod(xdt2,100),1,int(xdt2/100));
format dt1 dt2 yymm7.;
drop x:;
datalines; 
ID Biz1 Biz1A 201401 201301 127 1415 
ID Biz1 Biz1A 201401 201305 654 4673 
ID Biz1 Biz1A 201401 201307 98 299 
ID Biz1 Biz1A 201401 201309 123 30 
ID Biz1 Biz1A 201401 201310 145 1152 
ID Biz1 Biz1A 201401 201312 125 24 
ID Biz1 Biz1A 201401 201401 6787 19534 
ID Biz1 Biz1A 201402 201301 43 381 
ID Biz1 Biz1A 201402 201302 11 4 
ID Biz1 Biz1A 201402 201307 124 285 
ID Biz1 Biz1A 201402 201311 45 357 
ID Biz1 Biz1A 201402 201312 3 1 
;

proc sort data=have; by state lob1 lob2 dt1 dt2; run;

data want;
dt2 = constant("BIG");
do until(last.dt1);
    set have(rename=(dt2=_dt2 value1=_value1 value2=_value2)); 
    by state lob1 lob2 dt1;
    call missing(value1, value2);
    do while(dt2 < _dt2);
        output;
        dt2 = intnx("month",dt2,1);
        end;
    dt2 = _dt2;
    value1 = _value1;
    value2 = _value2;
    output;
    dt2 = intnx("month",dt2,1);
    end;
format dt2 yymm7.;
drop _: ;
run;

PG

Ksharp · Posted 08-15-2017 10:40 AM

data have;
input (State LOB1 LOB2) ($) xdt1 xdt2 Value1 Value2;
dt1 = mdy(mod(xdt1,100),1,int(xdt1/100));
dt2 = mdy(mod(xdt2,100),1,int(xdt2/100));
format dt1 dt2 yymmn6.;
drop x:;
datalines; 
ID Biz1 Biz1A 201401 201301 127 1415 
ID Biz1 Biz1A 201401 201305 654 4673 
ID Biz1 Biz1A 201401 201307 98 299 
ID Biz1 Biz1A 201401 201309 123 30 
ID Biz1 Biz1A 201401 201310 145 1152 
ID Biz1 Biz1A 201401 201312 125 24 
ID Biz1 Biz1A 201401 201401 6787 19534 
ID Biz1 Biz1A 201402 201301 43 381 
ID Biz1 Biz1A 201402 201302 11 4 
ID Biz1 Biz1A 201402 201307 124 285 
ID Biz1 Biz1A 201402 201311 45 357 
ID Biz1 Biz1A 201402 201312 3 1 
;

data want;
 merge have have(firstobs=2 keep=State dt1 dt2 rename=(State=_State dt1=_dt1 dt2=_dt2));
 output;
 if State=_State and dt1=_dt1 then do;
  do i=1 to intck('month',dt2,_dt2)-1;
   dt2=intnx('month',dt2,1);
   value1=.;value2=.;
   output;
 end;
end;
drop i _:;
run;

acemanhattan · Posted 08-15-2017 12:26 PM

@Ksharp This is great, and elegant, except it only seems to work on data if I've manually entered it like you did in your post. For example

data DUMMY;
   set personal.all_states_data_201707_A01;
   if INFORCE=201301;
   If mod(Pdmo,2)=0;
run;

Generates this Dummy data set:

state	LOB	LOB_CUST_TYPE	INFORCE	PDMO	PDAMT	TOTMEMS
ID	Biz1	BIZ1A	201301	201302	$282,691.60	0.666666667
ID	Biz1	BIZ1A	201301	201304	$42,402.79	0.5
ID	Biz1	BIZ1A	201301	201306	$3,665.05	-0.5
ID	Biz1	BIZ1A	201301	201308	$5,117.68	0.666666667
ID	Biz1	BIZ1A	201301	201310	$1,664.52	0
ID	Biz1	BIZ1A	201301	201312	$2,639.59	0
ID	Biz1	BIZ1A	201301	201402	$1,239.98	0
ID	Biz1	BIZ1A	201301	201404	$8,548.61	0
ID	Biz1	BIZ1A	201301	201406	($7.31)	0
ID	Biz1	BIZ1A	201301	201408	$1,337.29	0
ID	Biz1	BIZ1A	201301	201410	($44.43)	0
ID	Biz1	BIZ1A	201301	201412	$75.20	0
ID	Biz1	BIZ1A	201301	201502	$49.75	0
ID	Biz1	BIZ1A	201301	201508	($13.33)	0
ID	Biz1	BIZ1A	201301	201510	($11.43)	0
ID	Biz1	BIZ1A	201301	201702	($22.50)	0

But then, this

data results_1;
 merge dummy dummy(firstobs=2 keep=State INFORCE PDMO rename=(State=_State INFORCE=_INFORCE PDMO=_PDMO));
 output;
 if State=_State and INFORCE=_INFORCE then do;
  do i=1 to intck('month',PDMO,_PDMO)-1;
   PDMO=intnx('month',PDMO,1);
   PDAMT=.;TOTMEMS=.;
   output;
 end;
end;
drop i _:;
run;

Generates these results

state	LOB	LOB_CUST_TYPE	INFORCE	PDMO	PDAMT	TOTMEMS
ID	Biz1	BIZ1A	201301	201302	$282,691.60	0.666666667
ID	Biz1	BIZ1A	201301	201304	$42,402.79	0.5
ID	Biz1	BIZ1A	201301	201306	$3,665.05	-0.5
ID	Biz1	BIZ1A	201301	201308	$5,117.68	0.666666667
ID	Biz1	BIZ1A	201301	201310	$1,664.52	0
ID	Biz1	BIZ1A	201301	201312	$2,639.59	0
ID	Biz1	BIZ1A	201301	201339	.	.
ID	Biz1	BIZ1A	201301	201369	.	.
ID	Biz1	BIZ1A	201301	201402	$1,239.98	0
ID	Biz1	BIZ1A	201301	201404	$8,548.61	0
ID	Biz1	BIZ1A	201301	201406	($7.31)	0
ID	Biz1	BIZ1A	201301	201408	$1,337.29	0
ID	Biz1	BIZ1A	201301	201410	($44.43)	0
ID	Biz1	BIZ1A	201301	201412	$75.20	0
ID	Biz1	BIZ1A	201301	201430	.	.
ID	Biz1	BIZ1A	201301	201461	.	.
ID	Biz1	BIZ1A	201301	201502	$49.75	0
ID	Biz1	BIZ1A	201301	201508	($13.33)	0
ID	Biz1	BIZ1A	201301	201510	($11.43)	0
ID	Biz1	BIZ1A	201301	201522	.	.
ID	Biz1	BIZ1A	201301	201553	.	.
ID	Biz1	BIZ1A	201301	201583	.	.
ID	Biz1	BIZ1A	201301	201614	.	.
ID	Biz1	BIZ1A	201301	201645	.	.
ID	Biz1	BIZ1A	201301	201702	($22.50)	0

Yet, if I take the same data set and enter it in this fashiion, I get thte desired results.

data have;
input (State LOB LOB_CUST_TYPE) ($) INFORCE PDMO PDAMT TOTMEMS;
INFORCE = mdy(mod(INFORCE,100),1,int(INFORCE/100));
PDMO = mdy(mod(PDMO,100),1,int(PDMO/100));
format INFORCE PDMO yymmn6.;
drop x:;
datalines; 
ID Biz1 BIZ1A 201301 201302 282691.6 0.6666666667
ID Biz1 BIZ1A 201301 201304 42402.79 0.5
ID Biz1 BIZ1A 201301 201306 3665.05 -0.5
ID Biz1 BIZ1A 201301 201308 5117.68 0.6666666667
ID Biz1 BIZ1A 201301 201310 1664.52 0
ID Biz1 BIZ1A 201301 201312 2639.59 0
ID Biz1 BIZ1A 201301 201402 1239.98 0
ID Biz1 BIZ1A 201301 201404 8548.61 0
ID Biz1 BIZ1A 201301 201406 -7.31 0
ID Biz1 BIZ1A 201301 201408 1337.29 0
ID Biz1 BIZ1A 201301 201410 -44.43 0
ID Biz1 BIZ1A 201301 201412 75.2 0
ID Biz1 BIZ1A 201301 201502 49.75 0
ID Biz1 BIZ1A 201301 201508 -13.33 0
ID Biz1 BIZ1A 201301 201510 -11.43 0
ID Biz1 BIZ1A 201301 201702 -22.5 0
;

data results_2;
 merge have have(firstobs=2 keep=State INFORCE PDMO rename=(State=_State INFORCE=_INFORCE PDMO=_PDMO));
 output;
 if State=_State and INFORCE=_INFORCE then do;
  do i=1 to intck('month',PDMO,_PDMO)-1;
   PDMO=intnx('month',PDMO,1);
   PDAMT=.;TOTMEMS=.;
   output;
 end;
drop i _:;
run;

Ksharp · Posted 08-16-2017 08:49 AM

That is because PDMO is not a DATE type variable.

Chang it into DATE firstly as PG did .

INFORCE = mdy(mod(INFORCE,100),1,int(INFORCE/100));
PDMO = mdy(mod(PDMO,100),1,int(PDMO/100));

acemanhattan · Posted 08-16-2017 10:34 AM

@Ksharp, where within the code below would you expect placing those two lines of code to give us the desired results? I've tried placing them between every space, and I get varied results, but none of them what I'd hope for.

data results_1;

merge dummy dummy(firstobs=2 keep=State INFORCE PDMO rename=(State=_State INFORCE=_INFORCE PDMO=_PDMO));

 output;
 
 if State=_State and INFORCE=_INFORCE then do;

  do i=1 to intck('month',PDMO,_PDMO)-1;

   PDMO=intnx('month',PDMO,1);
 
   PDAMT=.;TOTMEMS=.;
   
   output;
 end;
end;
drop i _:;
run;

Ksharp · Posted 08-16-2017 10:48 AM

When generating the DUMMY table, change it into DATE type variable.
and your INFORCE and PDMO  is numeric variable.


data DUMMY;
   set personal.all_states_data_201707_A01;

INFORCE = mdy(mod(INFORCE,100),1,int(INFORCE/100));
PDMO = mdy(mod(PDMO,100),1,int(PDMO/100));


run;



After that , run my code .

Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Re: Fill In Missing Dates for Each Possible Combination

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away