BookmarkSubscribeRSS Feed
brophymj
Quartz | Level 8

Hi

I have a dataset of the following form. My dataset ranges from 2005 to June 2015 and it's split into quarters i.e. 200501, 200502, 200503, 200504, 200601, ..., 201502

Basically I want to make sure that for every Class, Line, Cover, Status that there is always a row for every quarter. Where there is no data for a quarter for a particular class,line,cover,status I want to create a row and where and put a vakue of 0 in for the Exposure, ClaimAmt and ClaimNo.

ClassLineCoverStatusYearQtrExposureClaimAmtClaimNo
NIHaulageCompNB200801202002
NIOwn GoodsTPOEB20100211123344
NIMotorcycleTPFTNB20110420112342
NIPrivate CarCompNB201001231133441

One way of doing this would be to create a dataset with blank values in every case and add it onto my existsing dataset (as I will just be summing over exposure, claimamt and claimno).

There is 1 class, 10 lines, 4 cover types, 2 statuses and 30 quarters so the dataset would have 1*10*4*2*30 = 2400 rows. Is there a an easy way to do this?

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, first way I think of is to create a template dataset which has all rows required, from min to max.  Then merge that back to your data:

proc sql;

     create TMP as

     select     min (YEAR_QTR) as MIN,

                   max(YEAR_QTR) as MAX

     from        HAVE;

quit;

data tmp;

     set tmp;

     do year_qtr=min to max;   

          output;

     end;

run;

proc sql;

     create table WANT as

     select     A.*

     from       HAVE A

     full join   TMP B

     on          A.YEAR_QTR=B.YEAR_QTR;

quit;

   

Alternatively you could sort your dataset, then do something like:

data want (drop old_vars);

     set have;

     do i=lag(year_qtr) to (year_qtr - 1);

          new_vars set to missing; output;

     end;

     output actual record;

run;

So the point being, create new variables which will hold all the data, then loop over the gap between previous year_qtr to this one minus 1, output missing rows if that loop does anything, then output the actual record.  Note, posting test data in the form of a datastep would help writing code.

brophymj
Quartz | Level 8

Thanks RW9 but what about the non-numeric fields such as line cover status? In the dataset I need the range of min/max qtr but for each status in each cover in each line i.e. 2400 rows.

data test;
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr exposure claimamt claimno 8. ;
datalines ;
NI,HAULAGE,COMP,NB,200802,22,234,44
NI,OWN GOODS,ADFT,NB,201002,2233,3434,44
NI,MOTORCYLE,TPO,EB,201101,233,23434,2
NI,PRIVATE CAR,TPD,NB,201304,123,22234,44
NI,TAXI,TPO,EB,200601,23,1124,2

run;

brophymj
Quartz | Level 8

In this case, there is 5 different classes, 4 cover types, 2 status and the range of yearqtrs spans 30 qtrs. so 1200 rows.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Then just change the levels and merge:

proc sql;

     create TMP as

     select     COVER,

                    STATUS,

                    min (YEAR_QTR) as MIN,

                   max(YEAR_QTR) as MAX

     from        HAVE

     group by  COVER,

                    STATUS;

quit;

...

Full join TMP B on A.COVER=B.COVER and A.STATUS=B.STATUS and A.YEAR_QTR=B.YEAR_QTR

Ksharp
Super User

Code: Program

data test; 
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr : yyq. exposure claimamt claimno 8. ;
format yearqtr  yyqn.;
datalines ;
NI,HAULAGE,COMP,NB,2008Q02,22,234,44
NI,OWN GOODS,ADFT,NB,2010Q02,2233,3434,44
NI,MOTORCYLE,TPO,EB,2011Q01,233,23434,2
NI,PRIVATE CAR,TPD,NB,2013Q04,123,22234,44
NI,TAXI,TPO,EB,2006Q01,23,1124,2
;
run; 
data temp;
do year=2005 to 2014;
  do qtr=1 to 4;
   date=yyq(year,qtr);
   output;
  end;
end;
do qtr=1 to 2;
  date=yyq(2015,qtr);
  output;
end;
format date yyqn.;
keep date;
run;
proc sql;
create table want as
select a.*,b.exposure, b.claimamt, b.claimno
  from
  (select * from
   (select distinct class, line, cover, status from test),(select date from temp)) as a
  left join test as b
   on a.class=b.class and a.line=b.line and a.cover=b.cover and a.status=b.status and a.date=b.yearqtr ;
quit;
proc stdize data=want out=final_want reponly missing=0;run;

brophymj
Quartz | Level 8

Hi Xia

I think I've over complicated what I'm looking for.

Basically, I want to have 1200 rows. So for each distinct class, line cover etc. I want to create a row i.e. every combination of class, line, cover.

brophymj
Quartz | Level 8

It's ok I figured it out using arrays.

Thanks

FatCaptain
Fluorite | Level 6

You can do this with tables in SQL. This is a little easier from an administration point of view if you ever need to change your lists.

The example below only uses 2 categories for each of your variables but you'll get the drift.

data line ;

line = 'Motorcycle' ; output ;

line = 'Own Goods' ; output ;

run ;

data cover ;

cover = 'Comp' ; output ;

cover = 'TPO' ; output ;

run ;

data status ;

status = 'NB' ; output ;

status = 'EB' ; output ;

run ;

data yearqtr ;

yearqtr = 200801 ; output ;

yearqtr = 200802 ; output ;

run ;

proc sql ;

create table all as

select 'NI' as class,

    a.line,

    b.cover,

    c.status,

    d.yearqtr,

    0 as exposure,

    0 as claimamt,

    0 as claimno

from   line as a,

    cover as b,

    status as c,

    yearqtr as d ;

quit ;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1803 views
  • 3 likes
  • 4 in conversation