Creating blanks rows where there is no data

brophymj · Posted 08-26-2015 05:46 AM

Hi

I have a dataset of the following form. My dataset ranges from 2005 to June 2015 and it's split into quarters i.e. 200501, 200502, 200503, 200504, 200601, ..., 201502

Basically I want to make sure that for every Class, Line, Cover, Status that there is always a row for every quarter. Where there is no data for a quarter for a particular class,line,cover,status I want to create a row and where and put a vakue of 0 in for the Exposure, ClaimAmt and ClaimNo.

Class	Line	Cover	Status	YearQtr	Exposure	ClaimAmt	ClaimNo
NI	Haulage	Comp	NB	200801	20	200	2
NI	Own Goods	TPO	EB	201002	11	12334	4
NI	Motorcycle	TPFT	NB	201104	201	1234	2
NI	Private Car	Comp	NB	201001	231	13344	1

One way of doing this would be to create a dataset with blank values in every case and add it onto my existsing dataset (as I will just be summing over exposure, claimamt and claimno).

There is 1 class, 10 lines, 4 cover types, 2 statuses and 30 quarters so the dataset would have 1*10*4*2*30 = 2400 rows. Is there a an easy way to do this?

RW9 · Posted 08-26-2015 06:06 AM

Well, first way I think of is to create a template dataset which has all rows required, from min to max. Then merge that back to your data:

proc sql;

create TMP as

select min (YEAR_QTR) as MIN,

max(YEAR_QTR) as MAX

from HAVE;

quit;

data tmp;

set tmp;

do year_qtr=min to max;

output;

end;

run;

proc sql;

create table WANT as

select A.*

from HAVE A

full join TMP B

on A.YEAR_QTR=B.YEAR_QTR;

quit;

Alternatively you could sort your dataset, then do something like:

data want (drop old_vars);

set have;

do i=lag(year_qtr) to (year_qtr - 1);

new_vars set to missing; output;

end;

output actual record;

run;

So the point being, create new variables which will hold all the data, then loop over the gap between previous year_qtr to this one minus 1, output missing rows if that loop does anything, then output the actual record. Note, posting test data in the form of a datastep would help writing code.

brophymj · Posted 08-26-2015 06:22 AM

Thanks RW9 but what about the non-numeric fields such as line cover status? In the dataset I need the range of min/max qtr but for each status in each cover in each line i.e. 2400 rows.

data test;
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr exposure claimamt claimno 8. ;
datalines ;
NI,HAULAGE,COMP,NB,200802,22,234,44
NI,OWN GOODS,ADFT,NB,201002,2233,3434,44
NI,MOTORCYLE,TPO,EB,201101,233,23434,2
NI,PRIVATE CAR,TPD,NB,201304,123,22234,44
NI,TAXI,TPO,EB,200601,23,1124,2

run;

brophymj · Posted 08-26-2015 06:34 AM

In this case, there is 5 different classes, 4 cover types, 2 status and the range of yearqtrs spans 30 qtrs. so 1200 rows.

RW9 · Posted 08-26-2015 07:25 AM

Then just change the levels and merge:

proc sql;

create TMP as

select COVER,

STATUS,

min (YEAR_QTR) as MIN,

max(YEAR_QTR) as MAX

from HAVE

group by COVER,

STATUS;

quit;

...

Full join TMP B on A.COVER=B.COVER and A.STATUS=B.STATUS and A.YEAR_QTR=B.YEAR_QTR

Ksharp · Posted 08-26-2015 08:37 AM

Code: Program

data test; 
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr : yyq. exposure claimamt claimno 8. ;
format yearqtr  yyqn.;
datalines ;
NI,HAULAGE,COMP,NB,2008Q02,22,234,44
NI,OWN GOODS,ADFT,NB,2010Q02,2233,3434,44
NI,MOTORCYLE,TPO,EB,2011Q01,233,23434,2
NI,PRIVATE CAR,TPD,NB,2013Q04,123,22234,44
NI,TAXI,TPO,EB,2006Q01,23,1124,2
;
run;  
data temp;
 do year=2005 to 2014;
  do qtr=1 to 4;
   date=yyq(year,qtr);
   output;
  end;
 end;
 do qtr=1 to 2;
  date=yyq(2015,qtr);
  output;
 end;
 format date yyqn.;
 keep date;
run;
proc sql;
create table want as
 select a.*,b.exposure, b.claimamt, b.claimno
  from 
  (select * from
   (select distinct class, line, cover, status from test),(select date from temp)) as a
  left join test as b 
   on a.class=b.class and a.line=b.line and a.cover=b.cover and a.status=b.status and a.date=b.yearqtr ;
quit;
proc stdize data=want out=final_want reponly missing=0;run;

brophymj · Posted 08-26-2015 09:21 AM

Hi Xia

I think I've over complicated what I'm looking for.

Basically, I want to have 1200 rows. So for each distinct class, line cover etc. I want to create a row i.e. every combination of class, line, cover.

brophymj · Posted 08-26-2015 09:53 AM

It's ok I figured it out using arrays.

Thanks

FatCaptain · Posted 08-26-2015 01:15 PM

You can do this with tables in SQL. This is a little easier from an administration point of view if you ever need to change your lists.

The example below only uses 2 categories for each of your variables but you'll get the drift.

data line ;

line = 'Motorcycle' ; output ;

line = 'Own Goods' ; output ;

run ;

data cover ;

cover = 'Comp' ; output ;

cover = 'TPO' ; output ;

run ;

data status ;

status = 'NB' ; output ;

status = 'EB' ; output ;

run ;

data yearqtr ;

yearqtr = 200801 ; output ;

yearqtr = 200802 ; output ;

run ;

proc sql ;

create table all as

select 'NI' as class,

a.line,

b.cover,

c.status,

d.yearqtr,

0 as exposure,

0 as claimamt,

0 as claimno

from line as a,

cover as b,

status as c,

yearqtr as d ;

quit ;