BookmarkSubscribeRSS Feed
brophymj
Quartz | Level 8

Hi

I have a dataset of the following form. My dataset ranges from 2005 to June 2015 and it's split into quarters i.e. 200501, 200502, 200503, 200504, 200601, ..., 201502

Basically I want to make sure that for every Class, Line, Cover, Status that there is always a row for every quarter. Where there is no data for a quarter for a particular class,line,cover,status I want to create a row and where and put a vakue of 0 in for the Exposure, ClaimAmt and ClaimNo.

ClassLineCoverStatusYearQtrExposureClaimAmtClaimNo
NIHaulageCompNB200801202002
NIOwn GoodsTPOEB20100211123344
NIMotorcycleTPFTNB20110420112342
NIPrivate CarCompNB201001231133441

One way of doing this would be to create a dataset with blank values in every case and add it onto my existsing dataset (as I will just be summing over exposure, claimamt and claimno).

There is 1 class, 10 lines, 4 cover types, 2 statuses and 30 quarters so the dataset would have 1*10*4*2*30 = 2400 rows. Is there a an easy way to do this?

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, first way I think of is to create a template dataset which has all rows required, from min to max.  Then merge that back to your data:

proc sql;

     create TMP as

     select     min (YEAR_QTR) as MIN,

                   max(YEAR_QTR) as MAX

     from        HAVE;

quit;

data tmp;

     set tmp;

     do year_qtr=min to max;   

          output;

     end;

run;

proc sql;

     create table WANT as

     select     A.*

     from       HAVE A

     full join   TMP B

     on          A.YEAR_QTR=B.YEAR_QTR;

quit;

   

Alternatively you could sort your dataset, then do something like:

data want (drop old_vars);

     set have;

     do i=lag(year_qtr) to (year_qtr - 1);

          new_vars set to missing; output;

     end;

     output actual record;

run;

So the point being, create new variables which will hold all the data, then loop over the gap between previous year_qtr to this one minus 1, output missing rows if that loop does anything, then output the actual record.  Note, posting test data in the form of a datastep would help writing code.

brophymj
Quartz | Level 8

Thanks RW9 but what about the non-numeric fields such as line cover status? In the dataset I need the range of min/max qtr but for each status in each cover in each line i.e. 2400 rows.

data test;
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr exposure claimamt claimno 8. ;
datalines ;
NI,HAULAGE,COMP,NB,200802,22,234,44
NI,OWN GOODS,ADFT,NB,201002,2233,3434,44
NI,MOTORCYLE,TPO,EB,201101,233,23434,2
NI,PRIVATE CAR,TPD,NB,201304,123,22234,44
NI,TAXI,TPO,EB,200601,23,1124,2

run;

brophymj
Quartz | Level 8

In this case, there is 5 different classes, 4 cover types, 2 status and the range of yearqtrs spans 30 qtrs. so 1200 rows.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Then just change the levels and merge:

proc sql;

     create TMP as

     select     COVER,

                    STATUS,

                    min (YEAR_QTR) as MIN,

                   max(YEAR_QTR) as MAX

     from        HAVE

     group by  COVER,

                    STATUS;

quit;

...

Full join TMP B on A.COVER=B.COVER and A.STATUS=B.STATUS and A.YEAR_QTR=B.YEAR_QTR

Ksharp
Super User

Code: Program

data test; 
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr : yyq. exposure claimamt claimno 8. ;
format yearqtr  yyqn.;
datalines ;
NI,HAULAGE,COMP,NB,2008Q02,22,234,44
NI,OWN GOODS,ADFT,NB,2010Q02,2233,3434,44
NI,MOTORCYLE,TPO,EB,2011Q01,233,23434,2
NI,PRIVATE CAR,TPD,NB,2013Q04,123,22234,44
NI,TAXI,TPO,EB,2006Q01,23,1124,2
;
run; 
data temp;
do year=2005 to 2014;
  do qtr=1 to 4;
   date=yyq(year,qtr);
   output;
  end;
end;
do qtr=1 to 2;
  date=yyq(2015,qtr);
  output;
end;
format date yyqn.;
keep date;
run;
proc sql;
create table want as
select a.*,b.exposure, b.claimamt, b.claimno
  from
  (select * from
   (select distinct class, line, cover, status from test),(select date from temp)) as a
  left join test as b
   on a.class=b.class and a.line=b.line and a.cover=b.cover and a.status=b.status and a.date=b.yearqtr ;
quit;
proc stdize data=want out=final_want reponly missing=0;run;

brophymj
Quartz | Level 8

Hi Xia

I think I've over complicated what I'm looking for.

Basically, I want to have 1200 rows. So for each distinct class, line cover etc. I want to create a row i.e. every combination of class, line, cover.

brophymj
Quartz | Level 8

It's ok I figured it out using arrays.

Thanks

FatCaptain
Fluorite | Level 6

You can do this with tables in SQL. This is a little easier from an administration point of view if you ever need to change your lists.

The example below only uses 2 categories for each of your variables but you'll get the drift.

data line ;

line = 'Motorcycle' ; output ;

line = 'Own Goods' ; output ;

run ;

data cover ;

cover = 'Comp' ; output ;

cover = 'TPO' ; output ;

run ;

data status ;

status = 'NB' ; output ;

status = 'EB' ; output ;

run ;

data yearqtr ;

yearqtr = 200801 ; output ;

yearqtr = 200802 ; output ;

run ;

proc sql ;

create table all as

select 'NI' as class,

    a.line,

    b.cover,

    c.status,

    d.yearqtr,

    0 as exposure,

    0 as claimamt,

    0 as claimno

from   line as a,

    cover as b,

    status as c,

    yearqtr as d ;

quit ;

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1061 views
  • 3 likes
  • 4 in conversation