DATA Step, Macro, Functions and more

Fill In Missing Dates for Each Possible Combination

Accepted Solution Solved
Reply
Contributor
Posts: 46
Accepted Solution

Fill In Missing Dates for Each Possible Combination

I have a set of data which is orgainized like this

 

State LOB1 LOB2 Dt1 Dt2 Value1 Value2
. . . . . . .
. . . . . . .
ID Biz1 Biz1A 201401 201301 127 1415
ID Biz1 Biz1A 201401 201305 654 4673
ID Biz1 Biz1A 201401 201307 98 299
ID Biz1 Biz1A 201401 201309 123 30
ID Biz1 Biz1A 201401 201310 145 1152
ID Biz1 Biz1A 201401 201312 125 24
ID Biz1 Biz1A 201401 201401 6787 19534
ID Biz1 Biz1A 201402 201301 43 381
ID Biz1 Biz1A 201402 201302 11 4
ID Biz1 Biz1A 201402 201307 124 285
ID Biz1 Biz1A 201402 201311 45 357
ID Biz1 Biz1A 201402 201312 3 1
. . . . . . .
. . . . . . .

 

Where there are 4 choices of state (shown here is ID), LOB1 comes from 20 different values, and LOB2 comes from 10 diifferent values, Dt1 ranges from 201301-201707, and Dt2 ranges from 201301-201707 (but Dt2 <= Dt1). I basically need to fill in the data such that there are no missing date combinations for any of the combinations of State, LOB1, LOB2 that exist in my data.

 

The data above would look like this, where the "x" denote observations that have been added:

 

  State LOB1 LOB2 Dt1 Dt2 Value1 Value2
  . . . . . . .
  . . . . . . .
  ID Biz1 Biz1A 201401 201301 127 1415
x ID Biz1 Biz1A 201401 201302   .
x ID Biz1 Biz1A 201401 201303   .
x ID Biz1 Biz1A 201401 201304   .
  ID Biz1 Biz1A 201401 201305 654 4673
x ID Biz1 Biz1A 201401 201306   .
  ID Biz1 Biz1A 201401 201307 98 299
x ID Biz1 Biz1A 201401 201308   .
  ID Biz1 Biz1A 201401 201309 123 30
  ID Biz1 Biz1A 201401 201310 145 1152
x ID Biz1 Biz1A 201401 201311   .
  ID Biz1 Biz1A 201401 201312 125 24
  ID Biz1 Biz1A 201401 201401 6787 19534
  ID Biz1 Biz1A 201402 201301 43 381
  ID Biz1 Biz1A 201402 201302 11 4
x ID Biz1 Biz1A 201402 201303   .
x ID Biz1 Biz1A 201402 201304   .
x ID Biz1 Biz1A 201402 201305   .
x ID Biz1 Biz1A 201402 201306   .
  ID Biz1 Biz1A 201402 201307 124 285
x ID Biz1 Biz1A 201402 201308   .
x ID Biz1 Biz1A 201402 201309   .
x ID Biz1 Biz1A 201402 201310   .
  ID Biz1 Biz1A 201402 201311 45 357
  ID Biz1 Biz1A 201402 201312 3 1
  . . . . . . .
  . . . . . . .

 

Thanks in advance.


Accepted Solutions
Solution
‎08-16-2017 11:19 AM
Super User
Posts: 10,028

Re: Fill In Missing Dates for Each Possible Combination

Posted in reply to acemanhattan
data have;
input (State LOB1 LOB2) ($) xdt1 xdt2 Value1 Value2;
dt1 = mdy(mod(xdt1,100),1,int(xdt1/100));
dt2 = mdy(mod(xdt2,100),1,int(xdt2/100));
format dt1 dt2 yymmn6.;
drop x:;
datalines; 
ID Biz1 Biz1A 201401 201301 127 1415 
ID Biz1 Biz1A 201401 201305 654 4673 
ID Biz1 Biz1A 201401 201307 98 299 
ID Biz1 Biz1A 201401 201309 123 30 
ID Biz1 Biz1A 201401 201310 145 1152 
ID Biz1 Biz1A 201401 201312 125 24 
ID Biz1 Biz1A 201401 201401 6787 19534 
ID Biz1 Biz1A 201402 201301 43 381 
ID Biz1 Biz1A 201402 201302 11 4 
ID Biz1 Biz1A 201402 201307 124 285 
ID Biz1 Biz1A 201402 201311 45 357 
ID Biz1 Biz1A 201402 201312 3 1 
;

data want;
 merge have have(firstobs=2 keep=State dt1 dt2 rename=(State=_State dt1=_dt1 dt2=_dt2));
 output;
 if State=_State and dt1=_dt1 then do;
  do i=1 to intck('month',dt2,_dt2)-1;
   dt2=intnx('month',dt2,1);
   value1=.;value2=.;
   output;
 end;
end;
drop i _:;
run;

View solution in original post


All Replies
Super User
Posts: 19,815

Re: Fill In Missing Dates for Each Possible Combination

Posted in reply to acemanhattan

PROC TIMESERIES - if you dates are SAS dates and you have SAS/ETS licensed.

https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52

 

Contributor
Posts: 46

Re: Fill In Missing Dates for Each Possible Combination

It looks like I'll want to switch the format of my YYYYMM date values, yes? To something that SAS processes as a date?
Contributor
Posts: 33

Re: Fill In Missing Dates for Each Possible Combination

Posted in reply to acemanhattan

write out the two months to 2 datasets and then join them without a join statement.  That'll give you the cartesian product.

Contributor
Posts: 46

Re: Fill In Missing Dates for Each Possible Combination

Posted in reply to tomrvincent
@tomrvincent

Ignoring for a second that I don't necessarily know how to do that (though I understand what you're proposing), will the Cartesian product give me the combinations of STATE, LOB1, LOB2 I need?

I was thinking of a solution similar to this, where I generate the range of Month2 across the range of Month1, and then join to my dataset, but I couldn't figure out how I'd get the other necessary values to be anything other than blank.
Contributor
Posts: 33

Re: Fill In Missing Dates for Each Possible Combination

Posted in reply to acemanhattan

no reason why it can't. I just did it by dt and value with no joins at all...got 26 records.

 

here's what I did (joining by state is all I tried):

 

/* get distinct '1' values */

PROC SQL;
   CREATE TABLE WORK.q1 AS
   SELECT DISTINCT t1.State,
          t1.LOB1,
          t1.Value1,
          t1.Dt1
      FROM WORK.BOOK1 t1;


/* get distinct '2' values */  

CREATE TABLE WORK.q2 AS
   SELECT DISTINCT t1.State,
          t1.LOB2,
          t1.Value2,
          t1.Dt2
      FROM WORK.BOOK1 t1;
/* join together by state only */

   CREATE TABLE WORK.q3 AS
   SELECT t1.State,
          t1.LOB1,
          t1.Value1,
          t1.Dt1,
          t2.State AS State1,
          t2.LOB2,
          t2.Value2,
          t2.Dt2
      FROM WORK.Q1 t1
           INNER JOIN WORK.Q2 t2 ON (t1.State = t2.State);
QUIT;

 

results in 294 records.

Respected Advisor
Posts: 4,925

Re: Fill In Missing Dates for Each Possible Combination

Posted in reply to acemanhattan

Looks like all you need is to fill in the missing dt2 months (with value1 and value2 set to missing for the added dates)

 

This would do it:

 

data have;
input (State LOB1 LOB2) ($) xdt1 xdt2 Value1 Value2;
dt1 = mdy(mod(xdt1,100),1,int(xdt1/100));
dt2 = mdy(mod(xdt2,100),1,int(xdt2/100));
format dt1 dt2 yymm7.;
drop x:;
datalines; 
ID Biz1 Biz1A 201401 201301 127 1415 
ID Biz1 Biz1A 201401 201305 654 4673 
ID Biz1 Biz1A 201401 201307 98 299 
ID Biz1 Biz1A 201401 201309 123 30 
ID Biz1 Biz1A 201401 201310 145 1152 
ID Biz1 Biz1A 201401 201312 125 24 
ID Biz1 Biz1A 201401 201401 6787 19534 
ID Biz1 Biz1A 201402 201301 43 381 
ID Biz1 Biz1A 201402 201302 11 4 
ID Biz1 Biz1A 201402 201307 124 285 
ID Biz1 Biz1A 201402 201311 45 357 
ID Biz1 Biz1A 201402 201312 3 1 
;

proc sort data=have; by state lob1 lob2 dt1 dt2; run;

data want;
dt2 = constant("BIG");
do until(last.dt1);
    set have(rename=(dt2=_dt2 value1=_value1 value2=_value2)); 
    by state lob1 lob2 dt1;
    call missing(value1, value2);
    do while(dt2 < _dt2);
        output;
        dt2 = intnx("month",dt2,1);
        end;
    dt2 = _dt2;
    value1 = _value1;
    value2 = _value2;
    output;
    dt2 = intnx("month",dt2,1);
    end;
format dt2 yymm7.;
drop _: ;
run;

PG
Solution
‎08-16-2017 11:19 AM
Super User
Posts: 10,028

Re: Fill In Missing Dates for Each Possible Combination

Posted in reply to acemanhattan
data have;
input (State LOB1 LOB2) ($) xdt1 xdt2 Value1 Value2;
dt1 = mdy(mod(xdt1,100),1,int(xdt1/100));
dt2 = mdy(mod(xdt2,100),1,int(xdt2/100));
format dt1 dt2 yymmn6.;
drop x:;
datalines; 
ID Biz1 Biz1A 201401 201301 127 1415 
ID Biz1 Biz1A 201401 201305 654 4673 
ID Biz1 Biz1A 201401 201307 98 299 
ID Biz1 Biz1A 201401 201309 123 30 
ID Biz1 Biz1A 201401 201310 145 1152 
ID Biz1 Biz1A 201401 201312 125 24 
ID Biz1 Biz1A 201401 201401 6787 19534 
ID Biz1 Biz1A 201402 201301 43 381 
ID Biz1 Biz1A 201402 201302 11 4 
ID Biz1 Biz1A 201402 201307 124 285 
ID Biz1 Biz1A 201402 201311 45 357 
ID Biz1 Biz1A 201402 201312 3 1 
;

data want;
 merge have have(firstobs=2 keep=State dt1 dt2 rename=(State=_State dt1=_dt1 dt2=_dt2));
 output;
 if State=_State and dt1=_dt1 then do;
  do i=1 to intck('month',dt2,_dt2)-1;
   dt2=intnx('month',dt2,1);
   value1=.;value2=.;
   output;
 end;
end;
drop i _:;
run;

Contributor
Posts: 46

Re: Fill In Missing Dates for Each Possible Combination

[ Edited ]

@Ksharp This is great, and elegant, except it only seems to work on data if I've manually entered it like you did in your post. For example

 

 

data DUMMY;
   set personal.all_states_data_201707_A01;
   if INFORCE=201301;
   If mod(Pdmo,2)=0;
run;

 

 Generates this Dummy data set:

state LOB LOB_CUST_TYPE INFORCE PDMO PDAMT TOTMEMS
ID Biz1 BIZ1A 201301 201302 $282,691.60 0.666666667
ID Biz1 BIZ1A 201301 201304 $42,402.79 0.5
ID Biz1 BIZ1A 201301 201306 $3,665.05 -0.5
ID Biz1 BIZ1A 201301 201308 $5,117.68 0.666666667
ID Biz1 BIZ1A 201301 201310 $1,664.52 0
ID Biz1 BIZ1A 201301 201312 $2,639.59 0
ID Biz1 BIZ1A 201301 201402 $1,239.98 0
ID Biz1 BIZ1A 201301 201404 $8,548.61 0
ID Biz1 BIZ1A 201301 201406 ($7.31) 0
ID Biz1 BIZ1A 201301 201408 $1,337.29 0
ID Biz1 BIZ1A 201301 201410 ($44.43) 0
ID Biz1 BIZ1A 201301 201412 $75.20 0
ID Biz1 BIZ1A 201301 201502 $49.75 0
ID Biz1 BIZ1A 201301 201508 ($13.33) 0
ID Biz1 BIZ1A 201301 201510 ($11.43) 0
ID Biz1 BIZ1A 201301 201702 ($22.50) 0

 

 

But then, this

 

data results_1;
 merge dummy dummy(firstobs=2 keep=State INFORCE PDMO rename=(State=_State INFORCE=_INFORCE PDMO=_PDMO));
 output;
 if State=_State and INFORCE=_INFORCE then do;
  do i=1 to intck('month',PDMO,_PDMO)-1;
   PDMO=intnx('month',PDMO,1);
   PDAMT=.;TOTMEMS=.;
   output;
 end;
end;
drop i _:;
run;

Generates these results

state LOB LOB_CUST_TYPE INFORCE PDMO PDAMT TOTMEMS
ID Biz1 BIZ1A 201301 201302 $282,691.60 0.666666667
ID Biz1 BIZ1A 201301 201304 $42,402.79 0.5
ID Biz1 BIZ1A 201301 201306 $3,665.05 -0.5
ID Biz1 BIZ1A 201301 201308 $5,117.68 0.666666667
ID Biz1 BIZ1A 201301 201310 $1,664.52 0
ID Biz1 BIZ1A 201301 201312 $2,639.59 0
ID Biz1 BIZ1A 201301 201339 .   .
ID Biz1 BIZ1A 201301 201369 .   .
ID Biz1 BIZ1A 201301 201402 $1,239.98 0
ID Biz1 BIZ1A 201301 201404 $8,548.61 0
ID Biz1 BIZ1A 201301 201406 ($7.31) 0
ID Biz1 BIZ1A 201301 201408 $1,337.29 0
ID Biz1 BIZ1A 201301 201410 ($44.43) 0
ID Biz1 BIZ1A 201301 201412 $75.20 0
ID Biz1 BIZ1A 201301 201430 .   .
ID Biz1 BIZ1A 201301 201461 .   .
ID Biz1 BIZ1A 201301 201502 $49.75 0
ID Biz1 BIZ1A 201301 201508 ($13.33) 0
ID Biz1 BIZ1A 201301 201510 ($11.43) 0
ID Biz1 BIZ1A 201301 201522 .   .
ID Biz1 BIZ1A 201301 201553 .   .
ID Biz1 BIZ1A 201301 201583 .   .
ID Biz1 BIZ1A 201301 201614 .   .
ID Biz1 BIZ1A 201301 201645 .   .
ID Biz1 BIZ1A 201301 201702 ($22.50) 0

 

 

Yet, if I take the same data set and enter it in this fashiion, I get thte desired results.

 

data have;
input (State LOB LOB_CUST_TYPE) ($) INFORCE PDMO PDAMT TOTMEMS;
INFORCE = mdy(mod(INFORCE,100),1,int(INFORCE/100));
PDMO = mdy(mod(PDMO,100),1,int(PDMO/100));
format INFORCE PDMO yymmn6.;
drop x:;
datalines; 
ID Biz1 BIZ1A 201301 201302 282691.6 0.6666666667
ID Biz1 BIZ1A 201301 201304 42402.79 0.5
ID Biz1 BIZ1A 201301 201306 3665.05 -0.5
ID Biz1 BIZ1A 201301 201308 5117.68 0.6666666667
ID Biz1 BIZ1A 201301 201310 1664.52 0
ID Biz1 BIZ1A 201301 201312 2639.59 0
ID Biz1 BIZ1A 201301 201402 1239.98 0
ID Biz1 BIZ1A 201301 201404 8548.61 0
ID Biz1 BIZ1A 201301 201406 -7.31 0
ID Biz1 BIZ1A 201301 201408 1337.29 0
ID Biz1 BIZ1A 201301 201410 -44.43 0
ID Biz1 BIZ1A 201301 201412 75.2 0
ID Biz1 BIZ1A 201301 201502 49.75 0
ID Biz1 BIZ1A 201301 201508 -13.33 0
ID Biz1 BIZ1A 201301 201510 -11.43 0
ID Biz1 BIZ1A 201301 201702 -22.5 0
;

data results_2;
 merge have have(firstobs=2 keep=State INFORCE PDMO rename=(State=_State INFORCE=_INFORCE PDMO=_PDMO));
 output;
 if State=_State and INFORCE=_INFORCE then do;
  do i=1 to intck('month',PDMO,_PDMO)-1;
   PDMO=intnx('month',PDMO,1);
   PDAMT=.;TOTMEMS=.;
   output;
 end;
drop i _:;
run;

 

 

Super User
Posts: 10,028

Re: Fill In Missing Dates for Each Possible Combination

Posted in reply to acemanhattan
That is because PDMO is not a DATE type variable.

Chang it into DATE firstly as PG did .

INFORCE = mdy(mod(INFORCE,100),1,int(INFORCE/100));
PDMO = mdy(mod(PDMO,100),1,int(PDMO/100));

Contributor
Posts: 46

Re: Fill In Missing Dates for Each Possible Combination

@Ksharp, where within the code below would you expect placing those two lines of code to give us the desired results? I've tried placing them between every space, and I get varied results, but none of them what I'd hope for.

data results_1;

merge dummy dummy(firstobs=2 keep=State INFORCE PDMO rename=(State=_State INFORCE=_INFORCE PDMO=_PDMO));

 output;
 
 if State=_State and INFORCE=_INFORCE then do;

  do i=1 to intck('month',PDMO,_PDMO)-1;

   PDMO=intnx('month',PDMO,1);
 
   PDAMT=.;TOTMEMS=.;
   
   output;
 end;
end;
drop i _:;
run;
Super User
Posts: 10,028

Re: Fill In Missing Dates for Each Possible Combination

Posted in reply to acemanhattan
When generating the DUMMY table, change it into DATE type variable.
and your INFORCE and PDMO  is numeric variable.


data DUMMY;
   set personal.all_states_data_201707_A01;

INFORCE = mdy(mod(INFORCE,100),1,int(INFORCE/100));
PDMO = mdy(mod(PDMO,100),1,int(PDMO/100));


run;



After that , run my code .

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 299 views
  • 3 likes
  • 5 in conversation