Hi
I have a dataset of the following form. My dataset ranges from 2005 to June 2015 and it's split into quarters i.e. 200501, 200502, 200503, 200504, 200601, ..., 201502
Basically I want to make sure that for every Class, Line, Cover, Status that there is always a row for every quarter. Where there is no data for a quarter for a particular class,line,cover,status I want to create a row and where and put a vakue of 0 in for the Exposure, ClaimAmt and ClaimNo.
Class | Line | Cover | Status | YearQtr | Exposure | ClaimAmt | ClaimNo |
---|---|---|---|---|---|---|---|
NI | Haulage | Comp | NB | 200801 | 20 | 200 | 2 |
NI | Own Goods | TPO | EB | 201002 | 11 | 12334 | 4 |
NI | Motorcycle | TPFT | NB | 201104 | 201 | 1234 | 2 |
NI | Private Car | Comp | NB | 201001 | 231 | 13344 | 1 |
One way of doing this would be to create a dataset with blank values in every case and add it onto my existsing dataset (as I will just be summing over exposure, claimamt and claimno).
There is 1 class, 10 lines, 4 cover types, 2 statuses and 30 quarters so the dataset would have 1*10*4*2*30 = 2400 rows. Is there a an easy way to do this?
Well, first way I think of is to create a template dataset which has all rows required, from min to max. Then merge that back to your data:
proc sql;
create TMP as
select min (YEAR_QTR) as MIN,
max(YEAR_QTR) as MAX
from HAVE;
quit;
data tmp;
set tmp;
do year_qtr=min to max;
output;
end;
run;
proc sql;
create table WANT as
select A.*
from HAVE A
full join TMP B
on A.YEAR_QTR=B.YEAR_QTR;
quit;
Alternatively you could sort your dataset, then do something like:
data want (drop old_vars);
set have;
do i=lag(year_qtr) to (year_qtr - 1);
new_vars set to missing; output;
end;
output actual record;
run;
So the point being, create new variables which will hold all the data, then loop over the gap between previous year_qtr to this one minus 1, output missing rows if that loop does anything, then output the actual record. Note, posting test data in the form of a datastep would help writing code.
Thanks RW9 but what about the non-numeric fields such as line cover status? In the dataset I need the range of min/max qtr but for each status in each cover in each line i.e. 2400 rows.
data test;
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr exposure claimamt claimno 8. ;
datalines ;
NI,HAULAGE,COMP,NB,200802,22,234,44
NI,OWN GOODS,ADFT,NB,201002,2233,3434,44
NI,MOTORCYLE,TPO,EB,201101,233,23434,2
NI,PRIVATE CAR,TPD,NB,201304,123,22234,44
NI,TAXI,TPO,EB,200601,23,1124,2
run;
In this case, there is 5 different classes, 4 cover types, 2 status and the range of yearqtrs spans 30 qtrs. so 1200 rows.
Then just change the levels and merge:
proc sql;
create TMP as
select COVER,
STATUS,
min (YEAR_QTR) as MIN,
max(YEAR_QTR) as MAX
from HAVE
group by COVER,
STATUS;
quit;
...
Full join TMP B on A.COVER=B.COVER and A.STATUS=B.STATUS and A.YEAR_QTR=B.YEAR_QTR
data test;
INFILE DATALINES DLM=',';
input class $ line $ cover $ status $ yearqtr : yyq. exposure claimamt claimno 8. ;
format yearqtr yyqn.;
datalines ;
NI,HAULAGE,COMP,NB,2008Q02,22,234,44
NI,OWN GOODS,ADFT,NB,2010Q02,2233,3434,44
NI,MOTORCYLE,TPO,EB,2011Q01,233,23434,2
NI,PRIVATE CAR,TPD,NB,2013Q04,123,22234,44
NI,TAXI,TPO,EB,2006Q01,23,1124,2
;
run;
data temp;
do year=2005 to 2014;
do qtr=1 to 4;
date=yyq(year,qtr);
output;
end;
end;
do qtr=1 to 2;
date=yyq(2015,qtr);
output;
end;
format date yyqn.;
keep date;
run;
proc sql;
create table want as
select a.*,b.exposure, b.claimamt, b.claimno
from
(select * from
(select distinct class, line, cover, status from test),(select date from temp)) as a
left join test as b
on a.class=b.class and a.line=b.line and a.cover=b.cover and a.status=b.status and a.date=b.yearqtr ;
quit;
proc stdize data=want out=final_want reponly missing=0;run;
Hi Xia
I think I've over complicated what I'm looking for.
Basically, I want to have 1200 rows. So for each distinct class, line cover etc. I want to create a row i.e. every combination of class, line, cover.
It's ok I figured it out using arrays.
Thanks
You can do this with tables in SQL. This is a little easier from an administration point of view if you ever need to change your lists.
The example below only uses 2 categories for each of your variables but you'll get the drift.
data line ;
line = 'Motorcycle' ; output ;
line = 'Own Goods' ; output ;
run ;
data cover ;
cover = 'Comp' ; output ;
cover = 'TPO' ; output ;
run ;
data status ;
status = 'NB' ; output ;
status = 'EB' ; output ;
run ;
data yearqtr ;
yearqtr = 200801 ; output ;
yearqtr = 200802 ; output ;
run ;
proc sql ;
create table all as
select 'NI' as class,
a.line,
b.cover,
c.status,
d.yearqtr,
0 as exposure,
0 as claimamt,
0 as claimno
from line as a,
cover as b,
status as c,
yearqtr as d ;
quit ;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.