I have a set of data which is orgainized like this
State | LOB1 | LOB2 | Dt1 | Dt2 | Value1 | Value2 |
. | . | . | . | . | . | . |
. | . | . | . | . | . | . |
ID | Biz1 | Biz1A | 201401 | 201301 | 127 | 1415 |
ID | Biz1 | Biz1A | 201401 | 201305 | 654 | 4673 |
ID | Biz1 | Biz1A | 201401 | 201307 | 98 | 299 |
ID | Biz1 | Biz1A | 201401 | 201309 | 123 | 30 |
ID | Biz1 | Biz1A | 201401 | 201310 | 145 | 1152 |
ID | Biz1 | Biz1A | 201401 | 201312 | 125 | 24 |
ID | Biz1 | Biz1A | 201401 | 201401 | 6787 | 19534 |
ID | Biz1 | Biz1A | 201402 | 201301 | 43 | 381 |
ID | Biz1 | Biz1A | 201402 | 201302 | 11 | 4 |
ID | Biz1 | Biz1A | 201402 | 201307 | 124 | 285 |
ID | Biz1 | Biz1A | 201402 | 201311 | 45 | 357 |
ID | Biz1 | Biz1A | 201402 | 201312 | 3 | 1 |
. | . | . | . | . | . | . |
. | . | . | . | . | . | . |
Where there are 4 choices of state (shown here is ID), LOB1 comes from 20 different values, and LOB2 comes from 10 diifferent values, Dt1 ranges from 201301-201707, and Dt2 ranges from 201301-201707 (but Dt2 <= Dt1). I basically need to fill in the data such that there are no missing date combinations for any of the combinations of State, LOB1, LOB2 that exist in my data.
The data above would look like this, where the "x" denote observations that have been added:
State | LOB1 | LOB2 | Dt1 | Dt2 | Value1 | Value2 | |
. | . | . | . | . | . | . | |
. | . | . | . | . | . | . | |
ID | Biz1 | Biz1A | 201401 | 201301 | 127 | 1415 | |
x | ID | Biz1 | Biz1A | 201401 | 201302 | . | |
x | ID | Biz1 | Biz1A | 201401 | 201303 | . | |
x | ID | Biz1 | Biz1A | 201401 | 201304 | . | |
ID | Biz1 | Biz1A | 201401 | 201305 | 654 | 4673 | |
x | ID | Biz1 | Biz1A | 201401 | 201306 | . | |
ID | Biz1 | Biz1A | 201401 | 201307 | 98 | 299 | |
x | ID | Biz1 | Biz1A | 201401 | 201308 | . | |
ID | Biz1 | Biz1A | 201401 | 201309 | 123 | 30 | |
ID | Biz1 | Biz1A | 201401 | 201310 | 145 | 1152 | |
x | ID | Biz1 | Biz1A | 201401 | 201311 | . | |
ID | Biz1 | Biz1A | 201401 | 201312 | 125 | 24 | |
ID | Biz1 | Biz1A | 201401 | 201401 | 6787 | 19534 | |
ID | Biz1 | Biz1A | 201402 | 201301 | 43 | 381 | |
ID | Biz1 | Biz1A | 201402 | 201302 | 11 | 4 | |
x | ID | Biz1 | Biz1A | 201402 | 201303 | . | |
x | ID | Biz1 | Biz1A | 201402 | 201304 | . | |
x | ID | Biz1 | Biz1A | 201402 | 201305 | . | |
x | ID | Biz1 | Biz1A | 201402 | 201306 | . | |
ID | Biz1 | Biz1A | 201402 | 201307 | 124 | 285 | |
x | ID | Biz1 | Biz1A | 201402 | 201308 | . | |
x | ID | Biz1 | Biz1A | 201402 | 201309 | . | |
x | ID | Biz1 | Biz1A | 201402 | 201310 | . | |
ID | Biz1 | Biz1A | 201402 | 201311 | 45 | 357 | |
ID | Biz1 | Biz1A | 201402 | 201312 | 3 | 1 | |
. | . | . | . | . | . | . | |
. | . | . | . | . | . | . |
Thanks in advance.
data have; input (State LOB1 LOB2) ($) xdt1 xdt2 Value1 Value2; dt1 = mdy(mod(xdt1,100),1,int(xdt1/100)); dt2 = mdy(mod(xdt2,100),1,int(xdt2/100)); format dt1 dt2 yymmn6.; drop x:; datalines; ID Biz1 Biz1A 201401 201301 127 1415 ID Biz1 Biz1A 201401 201305 654 4673 ID Biz1 Biz1A 201401 201307 98 299 ID Biz1 Biz1A 201401 201309 123 30 ID Biz1 Biz1A 201401 201310 145 1152 ID Biz1 Biz1A 201401 201312 125 24 ID Biz1 Biz1A 201401 201401 6787 19534 ID Biz1 Biz1A 201402 201301 43 381 ID Biz1 Biz1A 201402 201302 11 4 ID Biz1 Biz1A 201402 201307 124 285 ID Biz1 Biz1A 201402 201311 45 357 ID Biz1 Biz1A 201402 201312 3 1 ; data want; merge have have(firstobs=2 keep=State dt1 dt2 rename=(State=_State dt1=_dt1 dt2=_dt2)); output; if State=_State and dt1=_dt1 then do; do i=1 to intck('month',dt2,_dt2)-1; dt2=intnx('month',dt2,1); value1=.;value2=.; output; end; end; drop i _:; run;
PROC TIMESERIES - if you dates are SAS dates and you have SAS/ETS licensed.
https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52
write out the two months to 2 datasets and then join them without a join statement. That'll give you the cartesian product.
no reason why it can't. I just did it by dt and value with no joins at all...got 26 records.
here's what I did (joining by state is all I tried):
/* get distinct '1' values */
PROC SQL;
CREATE TABLE WORK.q1 AS
SELECT DISTINCT t1.State,
t1.LOB1,
t1.Value1,
t1.Dt1
FROM WORK.BOOK1 t1;
/* get distinct '2' values */
CREATE TABLE WORK.q2 AS
SELECT DISTINCT t1.State,
t1.LOB2,
t1.Value2,
t1.Dt2
FROM WORK.BOOK1 t1;
/* join together by state only */
CREATE TABLE WORK.q3 AS
SELECT t1.State,
t1.LOB1,
t1.Value1,
t1.Dt1,
t2.State AS State1,
t2.LOB2,
t2.Value2,
t2.Dt2
FROM WORK.Q1 t1
INNER JOIN WORK.Q2 t2 ON (t1.State = t2.State);
QUIT;
results in 294 records.
Looks like all you need is to fill in the missing dt2 months (with value1 and value2 set to missing for the added dates)
This would do it:
data have;
input (State LOB1 LOB2) ($) xdt1 xdt2 Value1 Value2;
dt1 = mdy(mod(xdt1,100),1,int(xdt1/100));
dt2 = mdy(mod(xdt2,100),1,int(xdt2/100));
format dt1 dt2 yymm7.;
drop x:;
datalines;
ID Biz1 Biz1A 201401 201301 127 1415
ID Biz1 Biz1A 201401 201305 654 4673
ID Biz1 Biz1A 201401 201307 98 299
ID Biz1 Biz1A 201401 201309 123 30
ID Biz1 Biz1A 201401 201310 145 1152
ID Biz1 Biz1A 201401 201312 125 24
ID Biz1 Biz1A 201401 201401 6787 19534
ID Biz1 Biz1A 201402 201301 43 381
ID Biz1 Biz1A 201402 201302 11 4
ID Biz1 Biz1A 201402 201307 124 285
ID Biz1 Biz1A 201402 201311 45 357
ID Biz1 Biz1A 201402 201312 3 1
;
proc sort data=have; by state lob1 lob2 dt1 dt2; run;
data want;
dt2 = constant("BIG");
do until(last.dt1);
set have(rename=(dt2=_dt2 value1=_value1 value2=_value2));
by state lob1 lob2 dt1;
call missing(value1, value2);
do while(dt2 < _dt2);
output;
dt2 = intnx("month",dt2,1);
end;
dt2 = _dt2;
value1 = _value1;
value2 = _value2;
output;
dt2 = intnx("month",dt2,1);
end;
format dt2 yymm7.;
drop _: ;
run;
data have; input (State LOB1 LOB2) ($) xdt1 xdt2 Value1 Value2; dt1 = mdy(mod(xdt1,100),1,int(xdt1/100)); dt2 = mdy(mod(xdt2,100),1,int(xdt2/100)); format dt1 dt2 yymmn6.; drop x:; datalines; ID Biz1 Biz1A 201401 201301 127 1415 ID Biz1 Biz1A 201401 201305 654 4673 ID Biz1 Biz1A 201401 201307 98 299 ID Biz1 Biz1A 201401 201309 123 30 ID Biz1 Biz1A 201401 201310 145 1152 ID Biz1 Biz1A 201401 201312 125 24 ID Biz1 Biz1A 201401 201401 6787 19534 ID Biz1 Biz1A 201402 201301 43 381 ID Biz1 Biz1A 201402 201302 11 4 ID Biz1 Biz1A 201402 201307 124 285 ID Biz1 Biz1A 201402 201311 45 357 ID Biz1 Biz1A 201402 201312 3 1 ; data want; merge have have(firstobs=2 keep=State dt1 dt2 rename=(State=_State dt1=_dt1 dt2=_dt2)); output; if State=_State and dt1=_dt1 then do; do i=1 to intck('month',dt2,_dt2)-1; dt2=intnx('month',dt2,1); value1=.;value2=.; output; end; end; drop i _:; run;
@Ksharp This is great, and elegant, except it only seems to work on data if I've manually entered it like you did in your post. For example
data DUMMY;
set personal.all_states_data_201707_A01;
if INFORCE=201301;
If mod(Pdmo,2)=0;
run;
Generates this Dummy data set:
state | LOB | LOB_CUST_TYPE | INFORCE | PDMO | PDAMT | TOTMEMS |
ID | Biz1 | BIZ1A | 201301 | 201302 | $282,691.60 | 0.666666667 |
ID | Biz1 | BIZ1A | 201301 | 201304 | $42,402.79 | 0.5 |
ID | Biz1 | BIZ1A | 201301 | 201306 | $3,665.05 | -0.5 |
ID | Biz1 | BIZ1A | 201301 | 201308 | $5,117.68 | 0.666666667 |
ID | Biz1 | BIZ1A | 201301 | 201310 | $1,664.52 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201312 | $2,639.59 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201402 | $1,239.98 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201404 | $8,548.61 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201406 | ($7.31) | 0 |
ID | Biz1 | BIZ1A | 201301 | 201408 | $1,337.29 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201410 | ($44.43) | 0 |
ID | Biz1 | BIZ1A | 201301 | 201412 | $75.20 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201502 | $49.75 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201508 | ($13.33) | 0 |
ID | Biz1 | BIZ1A | 201301 | 201510 | ($11.43) | 0 |
ID | Biz1 | BIZ1A | 201301 | 201702 | ($22.50) | 0 |
But then, this
data results_1;
merge dummy dummy(firstobs=2 keep=State INFORCE PDMO rename=(State=_State INFORCE=_INFORCE PDMO=_PDMO));
output;
if State=_State and INFORCE=_INFORCE then do;
do i=1 to intck('month',PDMO,_PDMO)-1;
PDMO=intnx('month',PDMO,1);
PDAMT=.;TOTMEMS=.;
output;
end;
end;
drop i _:;
run;
Generates these results
state | LOB | LOB_CUST_TYPE | INFORCE | PDMO | PDAMT | TOTMEMS |
ID | Biz1 | BIZ1A | 201301 | 201302 | $282,691.60 | 0.666666667 |
ID | Biz1 | BIZ1A | 201301 | 201304 | $42,402.79 | 0.5 |
ID | Biz1 | BIZ1A | 201301 | 201306 | $3,665.05 | -0.5 |
ID | Biz1 | BIZ1A | 201301 | 201308 | $5,117.68 | 0.666666667 |
ID | Biz1 | BIZ1A | 201301 | 201310 | $1,664.52 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201312 | $2,639.59 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201339 | . | . |
ID | Biz1 | BIZ1A | 201301 | 201369 | . | . |
ID | Biz1 | BIZ1A | 201301 | 201402 | $1,239.98 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201404 | $8,548.61 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201406 | ($7.31) | 0 |
ID | Biz1 | BIZ1A | 201301 | 201408 | $1,337.29 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201410 | ($44.43) | 0 |
ID | Biz1 | BIZ1A | 201301 | 201412 | $75.20 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201430 | . | . |
ID | Biz1 | BIZ1A | 201301 | 201461 | . | . |
ID | Biz1 | BIZ1A | 201301 | 201502 | $49.75 | 0 |
ID | Biz1 | BIZ1A | 201301 | 201508 | ($13.33) | 0 |
ID | Biz1 | BIZ1A | 201301 | 201510 | ($11.43) | 0 |
ID | Biz1 | BIZ1A | 201301 | 201522 | . | . |
ID | Biz1 | BIZ1A | 201301 | 201553 | . | . |
ID | Biz1 | BIZ1A | 201301 | 201583 | . | . |
ID | Biz1 | BIZ1A | 201301 | 201614 | . | . |
ID | Biz1 | BIZ1A | 201301 | 201645 | . | . |
ID | Biz1 | BIZ1A | 201301 | 201702 | ($22.50) | 0 |
Yet, if I take the same data set and enter it in this fashiion, I get thte desired results.
data have;
input (State LOB LOB_CUST_TYPE) ($) INFORCE PDMO PDAMT TOTMEMS;
INFORCE = mdy(mod(INFORCE,100),1,int(INFORCE/100));
PDMO = mdy(mod(PDMO,100),1,int(PDMO/100));
format INFORCE PDMO yymmn6.;
drop x:;
datalines;
ID Biz1 BIZ1A 201301 201302 282691.6 0.6666666667
ID Biz1 BIZ1A 201301 201304 42402.79 0.5
ID Biz1 BIZ1A 201301 201306 3665.05 -0.5
ID Biz1 BIZ1A 201301 201308 5117.68 0.6666666667
ID Biz1 BIZ1A 201301 201310 1664.52 0
ID Biz1 BIZ1A 201301 201312 2639.59 0
ID Biz1 BIZ1A 201301 201402 1239.98 0
ID Biz1 BIZ1A 201301 201404 8548.61 0
ID Biz1 BIZ1A 201301 201406 -7.31 0
ID Biz1 BIZ1A 201301 201408 1337.29 0
ID Biz1 BIZ1A 201301 201410 -44.43 0
ID Biz1 BIZ1A 201301 201412 75.2 0
ID Biz1 BIZ1A 201301 201502 49.75 0
ID Biz1 BIZ1A 201301 201508 -13.33 0
ID Biz1 BIZ1A 201301 201510 -11.43 0
ID Biz1 BIZ1A 201301 201702 -22.5 0
;
data results_2;
merge have have(firstobs=2 keep=State INFORCE PDMO rename=(State=_State INFORCE=_INFORCE PDMO=_PDMO));
output;
if State=_State and INFORCE=_INFORCE then do;
do i=1 to intck('month',PDMO,_PDMO)-1;
PDMO=intnx('month',PDMO,1);
PDAMT=.;TOTMEMS=.;
output;
end;
drop i _:;
run;
That is because PDMO is not a DATE type variable. Chang it into DATE firstly as PG did . INFORCE = mdy(mod(INFORCE,100),1,int(INFORCE/100)); PDMO = mdy(mod(PDMO,100),1,int(PDMO/100));
@Ksharp, where within the code below would you expect placing those two lines of code to give us the desired results? I've tried placing them between every space, and I get varied results, but none of them what I'd hope for.
data results_1;
merge dummy dummy(firstobs=2 keep=State INFORCE PDMO rename=(State=_State INFORCE=_INFORCE PDMO=_PDMO));
output;
if State=_State and INFORCE=_INFORCE then do;
do i=1 to intck('month',PDMO,_PDMO)-1;
PDMO=intnx('month',PDMO,1);
PDAMT=.;TOTMEMS=.;
output;
end;
end;
drop i _:;
run;
When generating the DUMMY table, change it into DATE type variable. and your INFORCE and PDMO is numeric variable. data DUMMY; set personal.all_states_data_201707_A01; INFORCE = mdy(mod(INFORCE,100),1,int(INFORCE/100)); PDMO = mdy(mod(PDMO,100),1,int(PDMO/100)); run; After that , run my code .
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.