I've inherited some code that is not working as intended. Can someone decipher what this code is doing. It reads in excel spreadsheet and is supposed to calculate a mean over all quarters for each month on book value (1-36). I'd like to understand how it's doing this (if it is) and the reason for lag terms and initializing most recent quarter with _n. The log prints first. and last. values but I don't follow. The result is not as expected.
%macro cum_corate36(grade);
data LC_co_&grade.36; set LC_CO."&grade.36$A1:AQ37"N; run;
proc sort; by MOB; run;
data LC_co1_&grade.36; set LC_co_&grade.36; by MOB;
  _7Q3_n=_7Q3*1; 
  array num  _7Q3n _7Q2 _7Q1 _6Q4 _6Q3 _6Q2 _6Q1 _5Q4 _5Q3 _5Q2 _5Q1 _4Q4 _4Q3 _4Q2 _4Q1 _3Q4 _3Q3 _3Q2 _3Q1 _2Q4 _2Q3 _2Q2 _2Q1 _1Q4 _1Q3 _1Q2 _1Q1 _0Q4 _0Q3 _0Q2 _0Q1 _9Q4 _9Q3 _9Q2 _9Q1 _8Q4 _8Q3 _8Q2 _8Q1;
  array numlag1 lag1_7Q3 lag1_7Q2 lag1_7Q1 lag1_6Q4 lag1_6Q3 lag1_6Q2 lag1_6Q1 lag1_5Q4 lag1_5Q3 lag1_5Q2 lag1_5Q1 lag1_4Q4 lag1_4Q3 lag1_4Q2 lag1_4Q1 lag1_3Q4 lag1_3Q3 lag1_3Q2 lag1_3Q1 lag1_2Q4 lag1_2Q3 lag1_2Q2 lag1_2Q1 lag1_1Q4 lag1_1Q3 lag1_1Q2 lag1_1Q1 lag1_0Q4 lag1_0Q3 lag1_0Q2 lag1_0Q1 lag1_9Q4 lag1_9Q3 lag1_9Q2 lag1_9Q1 lag1_8Q4 lag1_8Q3 lag1_8Q2 lag1_8Q1;
  array numC C_7Q3 C_7Q2 C_7Q1 C_6Q4 C_6Q3 C_6Q2 C_6Q1 C_5Q4 C_5Q3 C_5Q2 C_5Q1 C_4Q4 C_4Q3 C_4Q2 C_4Q1 C_3Q4 C_3Q3 C_3Q2 C_3Q1 C_2Q4 C_2Q3 C_2Q2 C_2Q1 C_1Q4 C_1Q3 C_1Q2 C_1Q1 C_0Q4 C_0Q3 C_0Q2 C_0Q1 C_9Q4 C_9Q3 C_9Q2 C_9Q1 C_8Q4 C_8Q3 C_8Q2 C_8Q1;
  do over num;
  	numlag1=lag(num);
	numC=num-numlag1;
	if MOB eq 1 then numC=.;
  end;
  C=mean(of C_:);
  keep MOB C;
run;
%mend;
%cum_corate36(A);
That is a great example of why you SHOULD NOT store your data in a wide format.
I would recommend re-writing it to transpose your data to a long format and you'll easily be able to calculate your summaries using PROC MEANS, dynamically, and then transpose back for reporting/display purposes or to meet requirements.
If you want help debugging this code, you should explain what it is doing that's not what you want, because we don't know what you 'intended'. And you should include the log, after setting the following options so that messages are pasted to the log:
options mprint symbolgen mlogic;
@kriscrowe wrote:
I've inherited some code that is not working as intended. Can someone decipher what this code is doing. It reads in excel spreadsheet and is supposed to calculate a mean over all quarters for each month on book value (1-36). I'd like to understand how it's doing this (if it is) and the reason for lag terms and initializing most recent quarter with _n. The log prints first. and last. values but I don't follow. The result is not as expected.
%macro cum_corate36(grade); data LC_co_&grade.36; set LC_CO."&grade.36$A1:AQ37"N; run; proc sort; by MOB; run; data LC_co1_&grade.36; set LC_co_&grade.36; by MOB; _7Q3_n=_7Q3*1; array num _7Q3n _7Q2 _7Q1 _6Q4 _6Q3 _6Q2 _6Q1 _5Q4 _5Q3 _5Q2 _5Q1 _4Q4 _4Q3 _4Q2 _4Q1 _3Q4 _3Q3 _3Q2 _3Q1 _2Q4 _2Q3 _2Q2 _2Q1 _1Q4 _1Q3 _1Q2 _1Q1 _0Q4 _0Q3 _0Q2 _0Q1 _9Q4 _9Q3 _9Q2 _9Q1 _8Q4 _8Q3 _8Q2 _8Q1; array numlag1 lag1_7Q3 lag1_7Q2 lag1_7Q1 lag1_6Q4 lag1_6Q3 lag1_6Q2 lag1_6Q1 lag1_5Q4 lag1_5Q3 lag1_5Q2 lag1_5Q1 lag1_4Q4 lag1_4Q3 lag1_4Q2 lag1_4Q1 lag1_3Q4 lag1_3Q3 lag1_3Q2 lag1_3Q1 lag1_2Q4 lag1_2Q3 lag1_2Q2 lag1_2Q1 lag1_1Q4 lag1_1Q3 lag1_1Q2 lag1_1Q1 lag1_0Q4 lag1_0Q3 lag1_0Q2 lag1_0Q1 lag1_9Q4 lag1_9Q3 lag1_9Q2 lag1_9Q1 lag1_8Q4 lag1_8Q3 lag1_8Q2 lag1_8Q1; array numC C_7Q3 C_7Q2 C_7Q1 C_6Q4 C_6Q3 C_6Q2 C_6Q1 C_5Q4 C_5Q3 C_5Q2 C_5Q1 C_4Q4 C_4Q3 C_4Q2 C_4Q1 C_3Q4 C_3Q3 C_3Q2 C_3Q1 C_2Q4 C_2Q3 C_2Q2 C_2Q1 C_1Q4 C_1Q3 C_1Q2 C_1Q1 C_0Q4 C_0Q3 C_0Q2 C_0Q1 C_9Q4 C_9Q3 C_9Q2 C_9Q1 C_8Q4 C_8Q3 C_8Q2 C_8Q1; do over num; numlag1=lag(num); numC=num-numlag1; if MOB eq 1 then numC=.; end; C=mean(of C_:); keep MOB C; run; %mend; %cum_corate36(A);
That is a great example of why you SHOULD NOT store your data in a wide format.
I would recommend re-writing it to transpose your data to a long format and you'll easily be able to calculate your summaries using PROC MEANS, dynamically, and then transpose back for reporting/display purposes or to meet requirements.
If you want help debugging this code, you should explain what it is doing that's not what you want, because we don't know what you 'intended'. And you should include the log, after setting the following options so that messages are pasted to the log:
options mprint symbolgen mlogic;
@kriscrowe wrote:
I've inherited some code that is not working as intended. Can someone decipher what this code is doing. It reads in excel spreadsheet and is supposed to calculate a mean over all quarters for each month on book value (1-36). I'd like to understand how it's doing this (if it is) and the reason for lag terms and initializing most recent quarter with _n. The log prints first. and last. values but I don't follow. The result is not as expected.
%macro cum_corate36(grade); data LC_co_&grade.36; set LC_CO."&grade.36$A1:AQ37"N; run; proc sort; by MOB; run; data LC_co1_&grade.36; set LC_co_&grade.36; by MOB; _7Q3_n=_7Q3*1; array num _7Q3n _7Q2 _7Q1 _6Q4 _6Q3 _6Q2 _6Q1 _5Q4 _5Q3 _5Q2 _5Q1 _4Q4 _4Q3 _4Q2 _4Q1 _3Q4 _3Q3 _3Q2 _3Q1 _2Q4 _2Q3 _2Q2 _2Q1 _1Q4 _1Q3 _1Q2 _1Q1 _0Q4 _0Q3 _0Q2 _0Q1 _9Q4 _9Q3 _9Q2 _9Q1 _8Q4 _8Q3 _8Q2 _8Q1; array numlag1 lag1_7Q3 lag1_7Q2 lag1_7Q1 lag1_6Q4 lag1_6Q3 lag1_6Q2 lag1_6Q1 lag1_5Q4 lag1_5Q3 lag1_5Q2 lag1_5Q1 lag1_4Q4 lag1_4Q3 lag1_4Q2 lag1_4Q1 lag1_3Q4 lag1_3Q3 lag1_3Q2 lag1_3Q1 lag1_2Q4 lag1_2Q3 lag1_2Q2 lag1_2Q1 lag1_1Q4 lag1_1Q3 lag1_1Q2 lag1_1Q1 lag1_0Q4 lag1_0Q3 lag1_0Q2 lag1_0Q1 lag1_9Q4 lag1_9Q3 lag1_9Q2 lag1_9Q1 lag1_8Q4 lag1_8Q3 lag1_8Q2 lag1_8Q1; array numC C_7Q3 C_7Q2 C_7Q1 C_6Q4 C_6Q3 C_6Q2 C_6Q1 C_5Q4 C_5Q3 C_5Q2 C_5Q1 C_4Q4 C_4Q3 C_4Q2 C_4Q1 C_3Q4 C_3Q3 C_3Q2 C_3Q1 C_2Q4 C_2Q3 C_2Q2 C_2Q1 C_1Q4 C_1Q3 C_1Q2 C_1Q1 C_0Q4 C_0Q3 C_0Q2 C_0Q1 C_9Q4 C_9Q3 C_9Q2 C_9Q1 C_8Q4 C_8Q3 C_8Q2 C_8Q1; do over num; numlag1=lag(num); numC=num-numlag1; if MOB eq 1 then numC=.; end; C=mean(of C_:); keep MOB C; run; %mend; %cum_corate36(A);
Proc transpose worked with one problem. The first occurrence of 7Q3 is zero which is causing this quarter to be missing from every MOB(month on book) group.
For example, I would expect to see 17Q3 in transposed results and it is missing for MOB. 17Q3 value is zero for MOB=1 but >0 for MOB=2 in original table. I think this may be why _7Q3_n=_7Q3*1; appeared in original code. What is best fix to include 17Q3 for all MOB?
| MOB | qtr | COL | 
| 1 | 17Q2 | |
| 1 | 17Q1 | |
| 1 | 16Q4 | |
| 1 | 16Q3 | |
| 1 | 16Q2 | |
| 1 | 16Q1 | |
| 1 | 15Q4 | |
| 1 | 15Q3 | |
| 2 | 17Q2 | |
| 2 | 17Q1 | |
| 2 | 16Q4 | |
| 2 | 16Q3 | |
| 2 | 16Q2 | |
| 2 | 16Q1 | |
| 2 | 15Q4 | |
| 2 | 15Q3 | 
What does your starting data look like?
So the main thing the data step is doing is taking a row of data values and finding the change in value between the current row's data and the previous row's data. It then takes the mean of those differences. So if you had data with two columns like this:
Row  Col1 Col2
 1    10   20 
 2    20   40 
 3    10   30 Since there are two data columns it needs to make 2*2=4 new variables to store the lags and the differences and then one more final variable to store the mean of the differences for this row.
Row  Col1 Col2  Lag1 Lag2 Dif1 Dif2 Mean
 1    10   20     .    .    .    .    .
 2    20   40    10   20   10   20   15
 3    10   30    20   40  -10  -10  -10
I am not sure what the MOB variable is and why it wants to set DIF vars to missing when MOB=1. Perhaps MOB=1 is like the first observation in a group? In which case the LAG values have come from a different group and so you want to make the differences be missing because they are meaningless if there is no previous value. Like the are in the first overall row (and in my example above). They should probably also set the lag value to missing, but perhaps they don't really care about the lag value.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
