BookmarkSubscribeRSS Feed
brophymj
Quartz | Level 8

Hi

I have a dataset of the following form. My dataset ranges from 2005 to June 2015 and it's split into quarters i.e. 200501, 200502, 200503, 200504, 200601, ..., 201502

Basically I want to make sure that for every Class, Line, Cover, Status that there is always a row for every quarter. Where there is no data for a quarter for a particular class,line,cover,status I want to create a row and where and put a vakue of 0 in for the Exposure, ClaimAmt and ClaimNo.

ClassLineCoverStatusYearQtrExposureClaimAmtClaimNo
NIHaulageCompNB200801202002
NIOwn GoodsTPOEB20100211123344
NIMotorcycleTPFTNB20110420112342
NIPrivate CarCompNB201001231133441

One way of doing this would be to create a dataset with blank values in every case and add it onto my existsing dataset (as I will just be summing over exposure, claimamt and claimno).

There is 1 class, 10 lines, 4 cover types, 2 statuses and 30 quarters so the dataset would have 1*10*4*2*30 = 2400 rows. Is there a an easy way to do this?

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, first way I think of is to create a template dataset which has all rows required, from min to max.  Then merge that back to your data:

proc sql;

     create TMP as

     select     min (YEAR_QTR) as MIN,

                   max(YEAR_QTR) as MAX

     from        HAVE;

quit;

data tmp;

     set tmp;

     do year_qtr=min to max;   

          output;

     end;

run;

proc sql;

     create table WANT as

     select     A.*

     from       HAVE A

     full join   TMP B

     on          A.YEAR_QTR=B.YEAR_QTR;

quit;

   

Alternatively you could sort your dataset, then do something like:

data want (drop old_vars);

     set have;

     do i=lag(year_qtr) to (year_qtr - 1);

          new_vars set to missing; output;

     end;

     output actual record;

run;

So the point being, create new variables which will hold all the data, then loop over the gap between previous year_qtr to this one minus 1, output missing rows if that loop does anything, then output the actual record.  Note, posting test data in the form of a datastep would help writing code.

brophymj
Quartz | Level 8

Thanks RW9 but what about the non-numeric fields such as line cover status? In the dataset I need the range of min/max qtr but for each status in each cover in each line i.e. 2400 rows.

data test;
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr exposure claimamt claimno 8. ;
datalines ;
NI,HAULAGE,COMP,NB,200802,22,234,44
NI,OWN GOODS,ADFT,NB,201002,2233,3434,44
NI,MOTORCYLE,TPO,EB,201101,233,23434,2
NI,PRIVATE CAR,TPD,NB,201304,123,22234,44
NI,TAXI,TPO,EB,200601,23,1124,2

run;

brophymj
Quartz | Level 8

In this case, there is 5 different classes, 4 cover types, 2 status and the range of yearqtrs spans 30 qtrs. so 1200 rows.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Then just change the levels and merge:

proc sql;

     create TMP as

     select     COVER,

                    STATUS,

                    min (YEAR_QTR) as MIN,

                   max(YEAR_QTR) as MAX

     from        HAVE

     group by  COVER,

                    STATUS;

quit;

...

Full join TMP B on A.COVER=B.COVER and A.STATUS=B.STATUS and A.YEAR_QTR=B.YEAR_QTR

Ksharp
Super User

Code: Program

data test; 
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr : yyq. exposure claimamt claimno 8. ;
format yearqtr  yyqn.;
datalines ;
NI,HAULAGE,COMP,NB,2008Q02,22,234,44
NI,OWN GOODS,ADFT,NB,2010Q02,2233,3434,44
NI,MOTORCYLE,TPO,EB,2011Q01,233,23434,2
NI,PRIVATE CAR,TPD,NB,2013Q04,123,22234,44
NI,TAXI,TPO,EB,2006Q01,23,1124,2
;
run; 
data temp;
do year=2005 to 2014;
  do qtr=1 to 4;
   date=yyq(year,qtr);
   output;
  end;
end;
do qtr=1 to 2;
  date=yyq(2015,qtr);
  output;
end;
format date yyqn.;
keep date;
run;
proc sql;
create table want as
select a.*,b.exposure, b.claimamt, b.claimno
  from
  (select * from
   (select distinct class, line, cover, status from test),(select date from temp)) as a
  left join test as b
   on a.class=b.class and a.line=b.line and a.cover=b.cover and a.status=b.status and a.date=b.yearqtr ;
quit;
proc stdize data=want out=final_want reponly missing=0;run;

brophymj
Quartz | Level 8

Hi Xia

I think I've over complicated what I'm looking for.

Basically, I want to have 1200 rows. So for each distinct class, line cover etc. I want to create a row i.e. every combination of class, line, cover.

brophymj
Quartz | Level 8

It's ok I figured it out using arrays.

Thanks

FatCaptain
Fluorite | Level 6

You can do this with tables in SQL. This is a little easier from an administration point of view if you ever need to change your lists.

The example below only uses 2 categories for each of your variables but you'll get the drift.

data line ;

line = 'Motorcycle' ; output ;

line = 'Own Goods' ; output ;

run ;

data cover ;

cover = 'Comp' ; output ;

cover = 'TPO' ; output ;

run ;

data status ;

status = 'NB' ; output ;

status = 'EB' ; output ;

run ;

data yearqtr ;

yearqtr = 200801 ; output ;

yearqtr = 200802 ; output ;

run ;

proc sql ;

create table all as

select 'NI' as class,

    a.line,

    b.cover,

    c.status,

    d.yearqtr,

    0 as exposure,

    0 as claimamt,

    0 as claimno

from   line as a,

    cover as b,

    status as c,

    yearqtr as d ;

quit ;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1227 views
  • 3 likes
  • 4 in conversation