08-10-2013 06:34 PM
Ladies and Gentlemen:
I have a panel data with variable ADD, BASE, and T (year). For every ADD on the time-series, adjust the BASE variable by adding 100% of ADD to BASE on year T, 75% to BASE in year T+1, 50% to BASE in year T+2, 25% to BASE in year T+3, and 0% to BASE in year T+4. Notice that ADD is also time-series variable also. Therefore, in a given year, there may be multiple adjustments of BASE from different years of ADD variable.
I tried to do this in PROC EXPAND, but have difficulty in realizing this. Could you guys give me some hint? Thank you!
Year | Original_Base | ADD | Allocation_for_ADD_ appeared_in Year1 | Allocation_for_ADD_ appeared_in_Year2 | Allocation_for_ADD_ appeared_in_Year3 | Allocation_for_ADD_ appeared_in_Year4 | Allocation_for_ADD_ appeared_in_Year5 | Allocation_for_ADD_ appeared_in_Year6 | ... | Adjusted Base |
---|---|---|---|---|---|---|---|---|---|---|
1 | 1000 | 100 | 100 | 1100 | ||||||
2 | 1200 | 100 | 75 | 100 | 1375 | |||||
3 | 1700 | 200 | 50 | 75 | 200 | 2025 | ||||
4 | 200 | 400 | 25 | 50 | 150 | 400 | 825 | |||
5 | 1100 | 800 | 25 | 100 | 300 | 800 | 2350 | |||
6 | 1800 | 200 | 50 | 200 | 600 | 200 | 2450 | |||
7 | 1000 | 1000 | 100 | 400 | 150 | 1650 | ||||
... | ... | 200 | 100 | ... | ||||||
50 | ... | |||||||||
... | ||||||||||
... |
08-11-2013 05:44 AM
Hi Caveman529,
I have tried to interpret your question and I have come up with a solution involving the DATA step that may work for you. It makes use of the LAG function, which recalls the value of a variable from a previous observation.
Providing your initial dataset looks like this:
add | base | T |
6 | 4 | 1 |
3 | 6 | 2 |
6 | 1 | 3 |
2 | 3 | 4 |
7 | 6 | 5 |
Utilise this program:
data solution;
set dsn;
/* use lag function to remember previous values of ADD */
addlag1=lag1(add);
addlag2=lag2(add);
addlag3=lag3(add);
/* fill in missing values in the first few observations of lag variables*/
if _n_=1 then do;
addlag1=0;
addlag2=0;
addlag3=0;
end;
if _n_=2 then do;
addlag2=0;
addlag3=0;
end;
if _n_=3 then do;
addlag3=0;
end;
/* Make final calculation */
total=base + add + 0.75*addlag1 + 0.5*addlag2 + 0.25*addlag3;
run;
The final dataset should look like this:
add | base | T | addlag1 | addlag2 | addlag3 | total |
6 | 4 | 1 | 0 | 0 | 0 | 10 |
3 | 6 | 2 | 6 | 0 | 0 | 13.5 |
6 | 1 | 3 | 3 | 6 | 0 | 12.25 |
2 | 3 | 4 | 6 | 3 | 6 | 12.5 |
7 | 6 | 5 | 2 | 6 | 3 | 18.25 |
Apologies as I do not have SAS on my home computer and I could not test this program for bugs etc.
Let me know if this is what you were looking for.
Regards,
-Murray
08-11-2013 02:30 AM
Can you provide a sample of your data along with an example of the calculation? I'm struggling to picture what you are asking for . . . particularly the part about ". . . in a given year, there may be multiple adjustments of BASE from different years . . . ."
08-11-2013 05:44 AM
Hi Caveman529,
I have tried to interpret your question and I have come up with a solution involving the DATA step that may work for you. It makes use of the LAG function, which recalls the value of a variable from a previous observation.
Providing your initial dataset looks like this:
add | base | T |
6 | 4 | 1 |
3 | 6 | 2 |
6 | 1 | 3 |
2 | 3 | 4 |
7 | 6 | 5 |
Utilise this program:
data solution;
set dsn;
/* use lag function to remember previous values of ADD */
addlag1=lag1(add);
addlag2=lag2(add);
addlag3=lag3(add);
/* fill in missing values in the first few observations of lag variables*/
if _n_=1 then do;
addlag1=0;
addlag2=0;
addlag3=0;
end;
if _n_=2 then do;
addlag2=0;
addlag3=0;
end;
if _n_=3 then do;
addlag3=0;
end;
/* Make final calculation */
total=base + add + 0.75*addlag1 + 0.5*addlag2 + 0.25*addlag3;
run;
The final dataset should look like this:
add | base | T | addlag1 | addlag2 | addlag3 | total |
6 | 4 | 1 | 0 | 0 | 0 | 10 |
3 | 6 | 2 | 6 | 0 | 0 | 13.5 |
6 | 1 | 3 | 3 | 6 | 0 | 12.25 |
2 | 3 | 4 | 6 | 3 | 6 | 12.5 |
7 | 6 | 5 | 2 | 6 | 3 | 18.25 |
Apologies as I do not have SAS on my home computer and I could not test this program for bugs etc.
Let me know if this is what you were looking for.
Regards,
-Murray
08-11-2013 06:34 PM
Dear Fugue and Murray:
Sorry for the confusion. I have updated the table. The most difficult part is that for each ADD on the time-series, I have to split them into several pieces over time and then allocate then to the BASE. And I have to take care of the ADD on each year. So my brain is now fried..
08-11-2013 08:11 PM
Caveman,
Murray's solution looks perfectly viable. If it helps to conceptualize the problem, you could always preprocess the ADD values along these lines:
data adds;
set have (keep=add year);
total_add = add; output
total_add = 0.75 * add; year = year + 1; output;
total_add = 0.5 * add; year = year + 1; output;
total_add = 0.25 * add; year = year + 1; output;
run;
proc summary data=adds nway;
class year;
var total_add;
output out=total_adds (keep=year total_add) sum=;
run;
Then merge back in by year. But given that Murray's solution is working, this would be a lot of jumping through unnecessary hoops.
Good luck.
08-12-2013 02:12 AM
Hi Caveman,
My solution does not emulate yours as I do not have an additional column for each year. This is impractical using SAS as the we only need to look back 3 years so 3 instances of the LAG function will suffice.
Here are the same calculations done on your data.
Year | Original_Base | ADD | ADDlag1 | ADDlag2 | ADDlag3 | Adjusted Base |
1 | 1000 | 100 | 0 | 0 | 0 | 1100 |
2 | 1200 | 100 | 100 | 0 | 0 | 1375 |
3 | 1700 | 200 | 100 | 100 | 0 | 2025 |
4 | 200 | 400 | 200 | 100 | 100 | 825 |
5 | 1100 | 800 | 400 | 200 | 100 | 2325 |
6 | 1800 | 200 | 800 | 400 | 200 | 2850 |
7 | 1000 | 1000 | 200 | 800 | 400 | 2650 |
08-12-2013 03:16 PM
Murray's solution appears to correctly calculate the adjusted base as per your instructions. His results don't match your example results, but that seems to be due to some errors in your table . . .
/* create sample data as per example provided */
data have ;
input year base add ;
datalines;
1 1000 100
2 1200 100
3 1700 200
4 200 400
5 1100 800
6 1800 200
7 1000 1000
;
;;;;
/* precautionary sort */
proc sort data= have ;
by year ;
/* calculate adjusted base */
data want ;
set have ;
by year ;
addlag1 = lag1(add)*0.75;
addlag2 = lag2(add)*0.50;
addlag3 = lag3(add)*0.25;
/* calculate adjusted base */
adj_base = sum ( base, add, addlag1 , addlag2, addlag3 );
run ;
08-15-2013 02:02 PM
Thank you so much, guys!
Need further help from the community? Please ask a new question.