Hi
I have a dataset of the following form. My dataset ranges from 2005 to June 2015 and it's split into quarters i.e. 200501, 200502, 200503, 200504, 200601, ..., 201502
Basically I want to make sure that for every Class, Line, Cover, Status that there is always a row for every quarter. Where there is no data for a quarter for a particular class,line,cover,status I want to create a row and where and put a vakue of 0 in for the Exposure, ClaimAmt and ClaimNo.
Class | Line | Cover | Status | YearQtr | Exposure | ClaimAmt | ClaimNo |
---|---|---|---|---|---|---|---|
NI | Haulage | Comp | NB | 200801 | 20 | 200 | 2 |
NI | Own Goods | TPO | EB | 201002 | 11 | 12334 | 4 |
NI | Motorcycle | TPFT | NB | 201104 | 201 | 1234 | 2 |
NI | Private Car | Comp | NB | 201001 | 231 | 13344 | 1 |
One way of doing this would be to create a dataset with blank values in every case and add it onto my existsing dataset (as I will just be summing over exposure, claimamt and claimno).
There is 1 class, 10 lines, 4 cover types, 2 statuses and 30 quarters so the dataset would have 1*10*4*2*30 = 2400 rows. Is there a an easy way to do this?
Well, first way I think of is to create a template dataset which has all rows required, from min to max. Then merge that back to your data:
proc sql;
create TMP as
select min (YEAR_QTR) as MIN,
max(YEAR_QTR) as MAX
from HAVE;
quit;
data tmp;
set tmp;
do year_qtr=min to max;
output;
end;
run;
proc sql;
create table WANT as
select A.*
from HAVE A
full join TMP B
on A.YEAR_QTR=B.YEAR_QTR;
quit;
Alternatively you could sort your dataset, then do something like:
data want (drop old_vars);
set have;
do i=lag(year_qtr) to (year_qtr - 1);
new_vars set to missing; output;
end;
output actual record;
run;
So the point being, create new variables which will hold all the data, then loop over the gap between previous year_qtr to this one minus 1, output missing rows if that loop does anything, then output the actual record. Note, posting test data in the form of a datastep would help writing code.
Thanks RW9 but what about the non-numeric fields such as line cover status? In the dataset I need the range of min/max qtr but for each status in each cover in each line i.e. 2400 rows.
data test;
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr exposure claimamt claimno 8. ;
datalines ;
NI,HAULAGE,COMP,NB,200802,22,234,44
NI,OWN GOODS,ADFT,NB,201002,2233,3434,44
NI,MOTORCYLE,TPO,EB,201101,233,23434,2
NI,PRIVATE CAR,TPD,NB,201304,123,22234,44
NI,TAXI,TPO,EB,200601,23,1124,2
run;
In this case, there is 5 different classes, 4 cover types, 2 status and the range of yearqtrs spans 30 qtrs. so 1200 rows.
Then just change the levels and merge:
proc sql;
create TMP as
select COVER,
STATUS,
min (YEAR_QTR) as MIN,
max(YEAR_QTR) as MAX
from HAVE
group by COVER,
STATUS;
quit;
...
Full join TMP B on A.COVER=B.COVER and A.STATUS=B.STATUS and A.YEAR_QTR=B.YEAR_QTR
data test;
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr : yyq. exposure claimamt claimno 8. ;
format yearqtr yyqn.;
datalines ;
NI,HAULAGE,COMP,NB,2008Q02,22,234,44
NI,OWN GOODS,ADFT,NB,2010Q02,2233,3434,44
NI,MOTORCYLE,TPO,EB,2011Q01,233,23434,2
NI,PRIVATE CAR,TPD,NB,2013Q04,123,22234,44
NI,TAXI,TPO,EB,2006Q01,23,1124,2
;
run;
data temp;
do year=2005 to 2014;
do qtr=1 to 4;
date=yyq(year,qtr);
output;
end;
end;
do qtr=1 to 2;
date=yyq(2015,qtr);
output;
end;
format date yyqn.;
keep date;
run;
proc sql;
create table want as
select a.*,b.exposure, b.claimamt, b.claimno
from
(select * from
(select distinct class, line, cover, status from test),(select date from temp)) as a
left join test as b
on a.class=b.class and a.line=b.line and a.cover=b.cover and a.status=b.status and a.date=b.yearqtr ;
quit;
proc stdize data=want out=final_want reponly missing=0;run;
Hi Xia
I think I've over complicated what I'm looking for.
Basically, I want to have 1200 rows. So for each distinct class, line cover etc. I want to create a row i.e. every combination of class, line, cover.
It's ok I figured it out using arrays.
Thanks
You can do this with tables in SQL. This is a little easier from an administration point of view if you ever need to change your lists.
The example below only uses 2 categories for each of your variables but you'll get the drift.
data line ;
line = 'Motorcycle' ; output ;
line = 'Own Goods' ; output ;
run ;
data cover ;
cover = 'Comp' ; output ;
cover = 'TPO' ; output ;
run ;
data status ;
status = 'NB' ; output ;
status = 'EB' ; output ;
run ;
data yearqtr ;
yearqtr = 200801 ; output ;
yearqtr = 200802 ; output ;
run ;
proc sql ;
create table all as
select 'NI' as class,
a.line,
b.cover,
c.status,
d.yearqtr,
0 as exposure,
0 as claimamt,
0 as claimno
from line as a,
cover as b,
status as c,
yearqtr as d ;
quit ;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.