Hi all,
I want to "expand" a dataset (without using PROC EXPAND) by adding missing observations based on TRNS_MONTH and TRNS_QTR by each ID group.
In the new added rows Colour and Weight needs to copy down their values while the AMTS (AMT_1 to AMT_4 needs to be set equal to zero.
Expansion needs to be carried up to the MAX TRNS_MONTH & TRNS_QTR as they are determined by ID
704 | green | ow | 201112 | 2011Q4 |
Thank you very much in advance
Nik
DATA HAVE | ||||||||
ID | Colour | Weight | TRNS_MONTH | TRNS_QTR | AMT_1 | AMT_2 | AMT_3 | AMT_4 |
811 | blue | ow | 201108 | 2011Q3 | 4900 | 0 | 0 | 0 |
811 | blue | ow | 201109 | 2011Q3 | 0 | 4580 | 230 | 700 |
704 | green | ow | 201003 | 2010Q1 | 33400 | 0 | 450 | 0 |
704 | green | ow | 201008 | 2010Q3 | 11 | 48000 | 0 | 0 |
704 | green | ow | 201012 | 2010Q4 | 345 | 5000 | 0 | -8900 |
704 | green | ow | 201112 | 2011Q4 | 23300 | 0 | 0 | 2145 |
38 | red | ow | 201012 | 2010Q4 | 3300 | 0 | 0 | 0 |
38 | red | ow | 201101 | 2011Q1 | 0 | 1300 | 0 | 560 |
DATA WANT | ||||||||
ID | Colour | Weight | TRNS_MONTH | TRNS_QTR | AMT_1 | AMT_2 | AMT_3 | AMT_4 |
811 | blue | ow | 201108 | 2011Q3 | 4900 | 0 | 0 | 0 |
811 | blue | ow | 201109 | 2011Q3 | 0 | 4580 | 230 | 700 |
811 | blue | ow | 201110 | 2011Q4 | 0 | 0 | 0 | 0 |
811 | blue | ow | 201111 | 2011Q4 | 0 | 0 | 0 | 0 |
811 | blue | ow | 201112 | 2011Q4 | 0 | 0 | 0 | 0 |
704 | green | ow | 201003 | 2010Q1 | 33400 | 0 | 450 | 0 |
704 | green | ow | 201004 | 2010Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201005 | 2010Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201006 | 2010Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201007 | 2010Q3 | 0 | 0 | 0 | 0 |
704 | green | ow | 201008 | 2010Q3 | 11 | 48000 | 0 | 0 |
704 | green | ow | 201009 | 2010Q3 | 0 | 0 | 0 | 0 |
704 | green | ow | 201010 | 2010Q4 | 0 | 0 | 0 | 0 |
704 | green | ow | 201011 | 2010Q4 | 0 | 0 | 0 | 0 |
704 | green | ow | 201012 | 2010Q4 | 345 | 5000 | 0 | -8900 |
704 | green | ow | 201101 | 2011Q1 | 0 | 0 | 0 | 0 |
704 | green | ow | 201102 | 2011Q1 | 0 | 0 | 0 | 0 |
704 | green | ow | 201103 | 2011Q1 | 0 | 0 | 0 | 0 |
704 | green | ow | 201104 | 2011Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201105 | 2011Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201106 | 2011Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201107 | 2011Q3 | 0 | 0 | 0 | 0 |
704 | green | ow | 201108 | 2011Q3 | 0 | 0 | 0 | 0 |
704 | green | ow | 201109 | 2011Q3 | 0 | 0 | 0 | 0 |
704 | green | ow | 201110 | 2011Q4 | 0 | 0 | 0 | 0 |
704 | green | ow | 201111 | 2011Q4 | 0 | 0 | 0 | 0 |
704 | green | ow | 201112 | 2011Q4 | 23300 | 0 | 0 | 2145 |
38 | red | ow | 201012 | 2010Q4 | 3300 | 0 | 0 | 0 |
38 | red | ow | 201101 | 2011Q1 | 0 | 1300 | 0 | 560 |
38 | red | ow | 201102 | 2011Q1 | 0 | 0 | 0 | 0 |
38 | red | ow | 201103 | 2011Q1 | 0 | 0 | 0 | 0 |
38 | red | ow | 201104 | 2011Q2 | 0 | 0 | 0 | 0 |
38 | red | ow | 201105 | 2011Q2 | 0 | 0 | 0 | 0 |
38 | red | ow | 201106 | 2011Q2 | 0 | 0 | 0 | 0 |
38 | red | ow | 201107 | 2011Q3 | 0 | 0 | 0 | 0 |
38 | red | ow | 201108 | 2011Q3 | 0 | 0 | 0 | 0 |
38 | red | ow | 201109 | 2011Q3 | 0 | 0 | 0 | 0 |
38 | red | ow | 201110 | 2011Q4 | 0 | 0 | 0 | 0 |
38 | red | ow | 201111 | 2011Q4 | 0 | 0 | 0 | 0 |
38 | red | ow | 201112 | 2011Q4 | 0 | 0 | 0 | 0 |
Hi,
That would make the coding simpler, as DOWs may not be needed. However, as lazy as I am, I will just make some minor change to my current code:
DATA HAVE;
input ID$ Colour$ Weight$ TRNS_MONTH :yymmn6. TRNS_QTR :yyq6. AMT_1 AMT_2 AMT_3 AMT_4;
format trns_month yymmn6.
trns_qtr yyq6.;
cards;
811 blue ow 201108 2011Q3 4900 0 0 0
811 blue ow 201109 2011Q3 0 4580 230 700
704 green ow 201003 2010Q1 33400 0 450 0
704 green ow 201008 2010Q3 11 48000 0 0
704 green ow 201012 2010Q4 345 5000 0 -8900
704 green ow 201112 2011Q4 23300 0 0 2145
38 red ow 201012 2010Q4 3300 0 0 0
38 red ow 201101 2011Q1 0 1300 0 560
;
proc sql;
select max(trns_month) into :mtm from have;
quit;
data want ;
format tm yymmn6.;
do until (last.id);
set have;
by id notsorted;
if first.id then _st=trns_month;
if last.id then do;
_ld=trns_month;
end;
end;
_i=0; tm =0;
do until (last.id);
set have;
by id notsorted;
_amt1=amt_1;
_amt2=amt_2;
_amt3=amt_3;
_amt4=amt_4;
do _i=_i by 1 while (tm < &mtm);
tm=intnx('month',_st,_i,'b');
tq=put(tm,yyq6.);
if trns_month=tm then do;
amt_1=_amt1;
amt_2=_amt2;
amt_3=_amt3;
amt_4=_amt4;
output;
if tm ne _ld then do;
_i+1;
leave;
end;
end;
else do;
amt_1=0;
amt_2=0;
amt_3=0;
amt_4=0;
output;
end;
end;
end;
rename tm=trns_month tq=trns_qtr;
drop trns_qtr trns_month _:;
run;
Haikuo
Well, you will probably want to pay for SAS/ETS, for not doing the following:
DATA HAVE;
input ID$ Colour$ Weight$ TRNS_MONTH :yymmn6. TRNS_QTR :yyq6. AMT_1 AMT_2 AMT_3 AMT_4;
format trns_month yymmn6.
trns_qtr yyq6.;
cards;
811 blue ow 201108 2011Q3 4900 0 0 0
811 blue ow 201109 2011Q3 0 4580 230 700
704 green ow 201003 2010Q1 33400 0 450 0
704 green ow 201008 2010Q3 11 48000 0 0
704 green ow 201012 2010Q4 345 5000 0 -8900
704 green ow 201112 2011Q4 23300 0 0 2145
38 red ow 201012 2010Q4 3300 0 0 0
38 red ow 201101 2011Q1 0 1300 0 560
;
data want ;
format tm yymmn6.;
do until (last.id);
set have;
by id notsorted;
if first.id then _st=trns_month;
if last.id then do;
_ed=intnx('month',intnx('year',trns_month,1,'b'),-1,'b');
_ld=trns_month;
end;
end;
_i=0; tm =0;
do until (last.id);
set have;
by id notsorted;
_amt1=amt_1;
_amt2=amt_2;
_amt3=amt_3;
_amt4=amt_4;
do _i=_i by 1 while (tm < _ed);
tm=intnx('month',_st,_i,'b');
tq=put(tm,yyq6.);
if trns_month=tm then do;
amt_1=_amt1;
amt_2=_amt2;
amt_3=_amt3;
amt_4=_amt4;
output;
if tm ne _ld then do;
_i+1;
leave;
end;
end;
else do;
amt_1=0;
amt_2=0;
amt_3=0;
amt_4=0;
output;
end;
end;
end;
rename tm=trns_month tq=trns_qtr;
drop trns_qtr trns_month _:;
run;
Good Luck,
Haikuo
Hi Hai Kuo,
Thank you for your code. I might have not been very clear but I want each and every ID group to "expand" up and including the Maximum TRNS_MONTH of the dataset, in which case is 201112.
Thank you again.
Nick
Hi,
That would make the coding simpler, as DOWs may not be needed. However, as lazy as I am, I will just make some minor change to my current code:
DATA HAVE;
input ID$ Colour$ Weight$ TRNS_MONTH :yymmn6. TRNS_QTR :yyq6. AMT_1 AMT_2 AMT_3 AMT_4;
format trns_month yymmn6.
trns_qtr yyq6.;
cards;
811 blue ow 201108 2011Q3 4900 0 0 0
811 blue ow 201109 2011Q3 0 4580 230 700
704 green ow 201003 2010Q1 33400 0 450 0
704 green ow 201008 2010Q3 11 48000 0 0
704 green ow 201012 2010Q4 345 5000 0 -8900
704 green ow 201112 2011Q4 23300 0 0 2145
38 red ow 201012 2010Q4 3300 0 0 0
38 red ow 201101 2011Q1 0 1300 0 560
;
proc sql;
select max(trns_month) into :mtm from have;
quit;
data want ;
format tm yymmn6.;
do until (last.id);
set have;
by id notsorted;
if first.id then _st=trns_month;
if last.id then do;
_ld=trns_month;
end;
end;
_i=0; tm =0;
do until (last.id);
set have;
by id notsorted;
_amt1=amt_1;
_amt2=amt_2;
_amt3=amt_3;
_amt4=amt_4;
do _i=_i by 1 while (tm < &mtm);
tm=intnx('month',_st,_i,'b');
tq=put(tm,yyq6.);
if trns_month=tm then do;
amt_1=_amt1;
amt_2=_amt2;
amt_3=_amt3;
amt_4=_amt4;
output;
if tm ne _ld then do;
_i+1;
leave;
end;
end;
else do;
amt_1=0;
amt_2=0;
amt_3=0;
amt_4=0;
output;
end;
end;
end;
rename tm=trns_month tq=trns_qtr;
drop trns_qtr trns_month _:;
run;
Haikuo
hi ...another idea (if you don't mind the IDs in ascending order) ...
proc sort data=have out=temp;
by id trns_month;
run;
data want (drop=y m dt start);
set temp (rename=(trns_month=dt));
by id;
if first.id then do;
amt_1=0; amt_2=0; amt_3=0; amt_4=0;
do y=year(dt) to 2011;
if y eq year(dt) then start = month(dt);
else start = 1;
do m=start to 12;
trns_month = mdy(m,1,y);
trns_qtr = trns_month;
output;
end;
end;
end;
format trns_month yymmn6. trns_qtr yyq6.;
run;
data want;
update want temp;
by id trns_month;
run;
DATA HAVE; input ID$ Colour$ Weight$ TRNS_MONTH :yymmn6. TRNS_QTR :yyq6. AMT_1 AMT_2 AMT_3 AMT_4; format trns_month yymmn6. trns_qtr yyq6.; cards; 811 blue ow 201108 2011Q3 4900 0 0 0 811 blue ow 201109 2011Q3 0 4580 230 700 704 green ow 201003 2010Q1 33400 0 450 0 704 green ow 201008 2010Q3 11 48000 0 0 704 green ow 201012 2010Q4 345 5000 0 -8900 704 green ow 201112 2011Q4 23300 0 0 2145 38 red ow 201012 2010Q4 3300 0 0 0 38 red ow 201101 2011Q1 0 1300 0 560 ; run; data want(drop=i _:); merge have have(firstobs=2 keep=ID TRNS_MONTH rename=(ID=_ID TRNS_MONTH=_TRNS_MONTH)); output; if id=_id then do; do i=1 to intck('month',TRNS_MONTH,_TRNS_MONTH)-1; TRNS_MONTH=intnx('month',TRNS_MONTH,1); TRNS_QTR=TRNS_MONTH; amt_1=0; amt_2=0; amt_3=0; amt_4=0; output; end; end; else do; do i=1 to intck('month',TRNS_MONTH,'01dec2011'd); TRNS_MONTH=intnx('month',TRNS_MONTH,1); TRNS_QTR=TRNS_MONTH; amt_1=0; amt_2=0; amt_3=0; amt_4=0; output; end; end; run;
Ksharp
Thank you all for your solutions
Kind regards
Nikos
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.