Hi all,
I want to "expand" a dataset (without using PROC EXPAND) by adding missing observations based on TRNS_MONTH and TRNS_QTR by each ID group.
In the new added rows Colour and Weight needs to copy down their values while the AMTS (AMT_1 to AMT_4 needs to be set equal to zero.
Expansion needs to be carried up to the MAX TRNS_MONTH & TRNS_QTR as they are determined by ID
704 | green | ow | 201112 | 2011Q4 |
Thank you very much in advance
Nik
DATA HAVE | ||||||||
ID | Colour | Weight | TRNS_MONTH | TRNS_QTR | AMT_1 | AMT_2 | AMT_3 | AMT_4 |
811 | blue | ow | 201108 | 2011Q3 | 4900 | 0 | 0 | 0 |
811 | blue | ow | 201109 | 2011Q3 | 0 | 4580 | 230 | 700 |
704 | green | ow | 201003 | 2010Q1 | 33400 | 0 | 450 | 0 |
704 | green | ow | 201008 | 2010Q3 | 11 | 48000 | 0 | 0 |
704 | green | ow | 201012 | 2010Q4 | 345 | 5000 | 0 | -8900 |
704 | green | ow | 201112 | 2011Q4 | 23300 | 0 | 0 | 2145 |
38 | red | ow | 201012 | 2010Q4 | 3300 | 0 | 0 | 0 |
38 | red | ow | 201101 | 2011Q1 | 0 | 1300 | 0 | 560 |
DATA WANT | ||||||||
ID | Colour | Weight | TRNS_MONTH | TRNS_QTR | AMT_1 | AMT_2 | AMT_3 | AMT_4 |
811 | blue | ow | 201108 | 2011Q3 | 4900 | 0 | 0 | 0 |
811 | blue | ow | 201109 | 2011Q3 | 0 | 4580 | 230 | 700 |
811 | blue | ow | 201110 | 2011Q4 | 0 | 0 | 0 | 0 |
811 | blue | ow | 201111 | 2011Q4 | 0 | 0 | 0 | 0 |
811 | blue | ow | 201112 | 2011Q4 | 0 | 0 | 0 | 0 |
704 | green | ow | 201003 | 2010Q1 | 33400 | 0 | 450 | 0 |
704 | green | ow | 201004 | 2010Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201005 | 2010Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201006 | 2010Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201007 | 2010Q3 | 0 | 0 | 0 | 0 |
704 | green | ow | 201008 | 2010Q3 | 11 | 48000 | 0 | 0 |
704 | green | ow | 201009 | 2010Q3 | 0 | 0 | 0 | 0 |
704 | green | ow | 201010 | 2010Q4 | 0 | 0 | 0 | 0 |
704 | green | ow | 201011 | 2010Q4 | 0 | 0 | 0 | 0 |
704 | green | ow | 201012 | 2010Q4 | 345 | 5000 | 0 | -8900 |
704 | green | ow | 201101 | 2011Q1 | 0 | 0 | 0 | 0 |
704 | green | ow | 201102 | 2011Q1 | 0 | 0 | 0 | 0 |
704 | green | ow | 201103 | 2011Q1 | 0 | 0 | 0 | 0 |
704 | green | ow | 201104 | 2011Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201105 | 2011Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201106 | 2011Q2 | 0 | 0 | 0 | 0 |
704 | green | ow | 201107 | 2011Q3 | 0 | 0 | 0 | 0 |
704 | green | ow | 201108 | 2011Q3 | 0 | 0 | 0 | 0 |
704 | green | ow | 201109 | 2011Q3 | 0 | 0 | 0 | 0 |
704 | green | ow | 201110 | 2011Q4 | 0 | 0 | 0 | 0 |
704 | green | ow | 201111 | 2011Q4 | 0 | 0 | 0 | 0 |
704 | green | ow | 201112 | 2011Q4 | 23300 | 0 | 0 | 2145 |
38 | red | ow | 201012 | 2010Q4 | 3300 | 0 | 0 | 0 |
38 | red | ow | 201101 | 2011Q1 | 0 | 1300 | 0 | 560 |
38 | red | ow | 201102 | 2011Q1 | 0 | 0 | 0 | 0 |
38 | red | ow | 201103 | 2011Q1 | 0 | 0 | 0 | 0 |
38 | red | ow | 201104 | 2011Q2 | 0 | 0 | 0 | 0 |
38 | red | ow | 201105 | 2011Q2 | 0 | 0 | 0 | 0 |
38 | red | ow | 201106 | 2011Q2 | 0 | 0 | 0 | 0 |
38 | red | ow | 201107 | 2011Q3 | 0 | 0 | 0 | 0 |
38 | red | ow | 201108 | 2011Q3 | 0 | 0 | 0 | 0 |
38 | red | ow | 201109 | 2011Q3 | 0 | 0 | 0 | 0 |
38 | red | ow | 201110 | 2011Q4 | 0 | 0 | 0 | 0 |
38 | red | ow | 201111 | 2011Q4 | 0 | 0 | 0 | 0 |
38 | red | ow | 201112 | 2011Q4 | 0 | 0 | 0 | 0 |
Hi,
That would make the coding simpler, as DOWs may not be needed. However, as lazy as I am, I will just make some minor change to my current code:
DATA HAVE;
input ID$ Colour$ Weight$ TRNS_MONTH :yymmn6. TRNS_QTR :yyq6. AMT_1 AMT_2 AMT_3 AMT_4;
format trns_month yymmn6.
trns_qtr yyq6.;
cards;
811 blue ow 201108 2011Q3 4900 0 0 0
811 blue ow 201109 2011Q3 0 4580 230 700
704 green ow 201003 2010Q1 33400 0 450 0
704 green ow 201008 2010Q3 11 48000 0 0
704 green ow 201012 2010Q4 345 5000 0 -8900
704 green ow 201112 2011Q4 23300 0 0 2145
38 red ow 201012 2010Q4 3300 0 0 0
38 red ow 201101 2011Q1 0 1300 0 560
;
proc sql;
select max(trns_month) into :mtm from have;
quit;
data want ;
format tm yymmn6.;
do until (last.id);
set have;
by id notsorted;
if first.id then _st=trns_month;
if last.id then do;
_ld=trns_month;
end;
end;
_i=0; tm =0;
do until (last.id);
set have;
by id notsorted;
_amt1=amt_1;
_amt2=amt_2;
_amt3=amt_3;
_amt4=amt_4;
do _i=_i by 1 while (tm < &mtm);
tm=intnx('month',_st,_i,'b');
tq=put(tm,yyq6.);
if trns_month=tm then do;
amt_1=_amt1;
amt_2=_amt2;
amt_3=_amt3;
amt_4=_amt4;
output;
if tm ne _ld then do;
_i+1;
leave;
end;
end;
else do;
amt_1=0;
amt_2=0;
amt_3=0;
amt_4=0;
output;
end;
end;
end;
rename tm=trns_month tq=trns_qtr;
drop trns_qtr trns_month _:;
run;
Haikuo
Well, you will probably want to pay for SAS/ETS, for not doing the following:
DATA HAVE;
input ID$ Colour$ Weight$ TRNS_MONTH :yymmn6. TRNS_QTR :yyq6. AMT_1 AMT_2 AMT_3 AMT_4;
format trns_month yymmn6.
trns_qtr yyq6.;
cards;
811 blue ow 201108 2011Q3 4900 0 0 0
811 blue ow 201109 2011Q3 0 4580 230 700
704 green ow 201003 2010Q1 33400 0 450 0
704 green ow 201008 2010Q3 11 48000 0 0
704 green ow 201012 2010Q4 345 5000 0 -8900
704 green ow 201112 2011Q4 23300 0 0 2145
38 red ow 201012 2010Q4 3300 0 0 0
38 red ow 201101 2011Q1 0 1300 0 560
;
data want ;
format tm yymmn6.;
do until (last.id);
set have;
by id notsorted;
if first.id then _st=trns_month;
if last.id then do;
_ed=intnx('month',intnx('year',trns_month,1,'b'),-1,'b');
_ld=trns_month;
end;
end;
_i=0; tm =0;
do until (last.id);
set have;
by id notsorted;
_amt1=amt_1;
_amt2=amt_2;
_amt3=amt_3;
_amt4=amt_4;
do _i=_i by 1 while (tm < _ed);
tm=intnx('month',_st,_i,'b');
tq=put(tm,yyq6.);
if trns_month=tm then do;
amt_1=_amt1;
amt_2=_amt2;
amt_3=_amt3;
amt_4=_amt4;
output;
if tm ne _ld then do;
_i+1;
leave;
end;
end;
else do;
amt_1=0;
amt_2=0;
amt_3=0;
amt_4=0;
output;
end;
end;
end;
rename tm=trns_month tq=trns_qtr;
drop trns_qtr trns_month _:;
run;
Good Luck,
Haikuo
Hi Hai Kuo,
Thank you for your code. I might have not been very clear but I want each and every ID group to "expand" up and including the Maximum TRNS_MONTH of the dataset, in which case is 201112.
Thank you again.
Nick
Hi,
That would make the coding simpler, as DOWs may not be needed. However, as lazy as I am, I will just make some minor change to my current code:
DATA HAVE;
input ID$ Colour$ Weight$ TRNS_MONTH :yymmn6. TRNS_QTR :yyq6. AMT_1 AMT_2 AMT_3 AMT_4;
format trns_month yymmn6.
trns_qtr yyq6.;
cards;
811 blue ow 201108 2011Q3 4900 0 0 0
811 blue ow 201109 2011Q3 0 4580 230 700
704 green ow 201003 2010Q1 33400 0 450 0
704 green ow 201008 2010Q3 11 48000 0 0
704 green ow 201012 2010Q4 345 5000 0 -8900
704 green ow 201112 2011Q4 23300 0 0 2145
38 red ow 201012 2010Q4 3300 0 0 0
38 red ow 201101 2011Q1 0 1300 0 560
;
proc sql;
select max(trns_month) into :mtm from have;
quit;
data want ;
format tm yymmn6.;
do until (last.id);
set have;
by id notsorted;
if first.id then _st=trns_month;
if last.id then do;
_ld=trns_month;
end;
end;
_i=0; tm =0;
do until (last.id);
set have;
by id notsorted;
_amt1=amt_1;
_amt2=amt_2;
_amt3=amt_3;
_amt4=amt_4;
do _i=_i by 1 while (tm < &mtm);
tm=intnx('month',_st,_i,'b');
tq=put(tm,yyq6.);
if trns_month=tm then do;
amt_1=_amt1;
amt_2=_amt2;
amt_3=_amt3;
amt_4=_amt4;
output;
if tm ne _ld then do;
_i+1;
leave;
end;
end;
else do;
amt_1=0;
amt_2=0;
amt_3=0;
amt_4=0;
output;
end;
end;
end;
rename tm=trns_month tq=trns_qtr;
drop trns_qtr trns_month _:;
run;
Haikuo
hi ...another idea (if you don't mind the IDs in ascending order) ...
proc sort data=have out=temp;
by id trns_month;
run;
data want (drop=y m dt start);
set temp (rename=(trns_month=dt));
by id;
if first.id then do;
amt_1=0; amt_2=0; amt_3=0; amt_4=0;
do y=year(dt) to 2011;
if y eq year(dt) then start = month(dt);
else start = 1;
do m=start to 12;
trns_month = mdy(m,1,y);
trns_qtr = trns_month;
output;
end;
end;
end;
format trns_month yymmn6. trns_qtr yyq6.;
run;
data want;
update want temp;
by id trns_month;
run;
DATA HAVE; input ID$ Colour$ Weight$ TRNS_MONTH :yymmn6. TRNS_QTR :yyq6. AMT_1 AMT_2 AMT_3 AMT_4; format trns_month yymmn6. trns_qtr yyq6.; cards; 811 blue ow 201108 2011Q3 4900 0 0 0 811 blue ow 201109 2011Q3 0 4580 230 700 704 green ow 201003 2010Q1 33400 0 450 0 704 green ow 201008 2010Q3 11 48000 0 0 704 green ow 201012 2010Q4 345 5000 0 -8900 704 green ow 201112 2011Q4 23300 0 0 2145 38 red ow 201012 2010Q4 3300 0 0 0 38 red ow 201101 2011Q1 0 1300 0 560 ; run; data want(drop=i _:); merge have have(firstobs=2 keep=ID TRNS_MONTH rename=(ID=_ID TRNS_MONTH=_TRNS_MONTH)); output; if id=_id then do; do i=1 to intck('month',TRNS_MONTH,_TRNS_MONTH)-1; TRNS_MONTH=intnx('month',TRNS_MONTH,1); TRNS_QTR=TRNS_MONTH; amt_1=0; amt_2=0; amt_3=0; amt_4=0; output; end; end; else do; do i=1 to intck('month',TRNS_MONTH,'01dec2011'd); TRNS_MONTH=intnx('month',TRNS_MONTH,1); TRNS_QTR=TRNS_MONTH; amt_1=0; amt_2=0; amt_3=0; amt_4=0; output; end; end; run;
Ksharp
Thank you all for your solutions
Kind regards
Nikos
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.