Array Processing; Problem calculating mean across all obs

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Array Processing; Problem calculating mean across all obs

I've inherited some code that is not working as intended. Can someone decipher what this code is doing. It reads in excel spreadsheet and is supposed to calculate a mean over all quarters for each month on book value (1-36). I'd like to understand how it's doing this (if it is) and the reason for lag terms and initializing most recent quarter with _n. The log prints first. and last. values but I don't follow. The result is not as expected.

%macro cum_corate36(grade);
data LC_co_&grade.36; set LC_CO."&grade.36$A1:AQ37"N; run;
proc sort; by MOB; run;

data LC_co1_&grade.36; set LC_co_&grade.36; by MOB;
  _7Q3_n=_7Q3*1; 
  array num  _7Q3n _7Q2 _7Q1 _6Q4 _6Q3 _6Q2 _6Q1 _5Q4 _5Q3 _5Q2 _5Q1 _4Q4 _4Q3 _4Q2 _4Q1 _3Q4 _3Q3 _3Q2 _3Q1 _2Q4 _2Q3 _2Q2 _2Q1 _1Q4 _1Q3 _1Q2 _1Q1 _0Q4 _0Q3 _0Q2 _0Q1 _9Q4 _9Q3 _9Q2 _9Q1 _8Q4 _8Q3 _8Q2 _8Q1;
  array numlag1 lag1_7Q3 lag1_7Q2 lag1_7Q1 lag1_6Q4 lag1_6Q3 lag1_6Q2 lag1_6Q1 lag1_5Q4 lag1_5Q3 lag1_5Q2 lag1_5Q1 lag1_4Q4 lag1_4Q3 lag1_4Q2 lag1_4Q1 lag1_3Q4 lag1_3Q3 lag1_3Q2 lag1_3Q1 lag1_2Q4 lag1_2Q3 lag1_2Q2 lag1_2Q1 lag1_1Q4 lag1_1Q3 lag1_1Q2 lag1_1Q1 lag1_0Q4 lag1_0Q3 lag1_0Q2 lag1_0Q1 lag1_9Q4 lag1_9Q3 lag1_9Q2 lag1_9Q1 lag1_8Q4 lag1_8Q3 lag1_8Q2 lag1_8Q1;
  array numC C_7Q3 C_7Q2 C_7Q1 C_6Q4 C_6Q3 C_6Q2 C_6Q1 C_5Q4 C_5Q3 C_5Q2 C_5Q1 C_4Q4 C_4Q3 C_4Q2 C_4Q1 C_3Q4 C_3Q3 C_3Q2 C_3Q1 C_2Q4 C_2Q3 C_2Q2 C_2Q1 C_1Q4 C_1Q3 C_1Q2 C_1Q1 C_0Q4 C_0Q3 C_0Q2 C_0Q1 C_9Q4 C_9Q3 C_9Q2 C_9Q1 C_8Q4 C_8Q3 C_8Q2 C_8Q1;
  do over num;
  	numlag1=lag(num);
	numC=num-numlag1;
	if MOB eq 1 then numC=.;
  end;
  C=mean(of C_:);
  keep MOB C;
run;
%mend;

%cum_corate36(A);

 


Accepted Solutions
Solution
‎02-25-2018 03:11 PM
Super User
Posts: 23,778

Re: Array Processing; Problem calculating mean across all obs

Posted in reply to kriscrowe

That is a great example of why you SHOULD NOT store your data in a wide format. 

 

I would recommend re-writing it to transpose your data to a long format and you'll easily be able to calculate your summaries using PROC MEANS, dynamically, and then transpose back for reporting/display purposes or to meet requirements. 

 

If you want help debugging this code, you should explain what it is doing that's not what you want, because we don't know what you 'intended'. And you should include the log, after setting the following options so that messages are pasted to the log:

 

options mprint symbolgen mlogic;

kriscrowe wrote:

I've inherited some code that is not working as intended. Can someone decipher what this code is doing. It reads in excel spreadsheet and is supposed to calculate a mean over all quarters for each month on book value (1-36). I'd like to understand how it's doing this (if it is) and the reason for lag terms and initializing most recent quarter with _n. The log prints first. and last. values but I don't follow. The result is not as expected.

%macro cum_corate36(grade);
data LC_co_&grade.36; set LC_CO."&grade.36$A1:AQ37"N; run;
proc sort; by MOB; run;

data LC_co1_&grade.36; set LC_co_&grade.36; by MOB;
  _7Q3_n=_7Q3*1; 
  array num  _7Q3n _7Q2 _7Q1 _6Q4 _6Q3 _6Q2 _6Q1 _5Q4 _5Q3 _5Q2 _5Q1 _4Q4 _4Q3 _4Q2 _4Q1 _3Q4 _3Q3 _3Q2 _3Q1 _2Q4 _2Q3 _2Q2 _2Q1 _1Q4 _1Q3 _1Q2 _1Q1 _0Q4 _0Q3 _0Q2 _0Q1 _9Q4 _9Q3 _9Q2 _9Q1 _8Q4 _8Q3 _8Q2 _8Q1;
  array numlag1 lag1_7Q3 lag1_7Q2 lag1_7Q1 lag1_6Q4 lag1_6Q3 lag1_6Q2 lag1_6Q1 lag1_5Q4 lag1_5Q3 lag1_5Q2 lag1_5Q1 lag1_4Q4 lag1_4Q3 lag1_4Q2 lag1_4Q1 lag1_3Q4 lag1_3Q3 lag1_3Q2 lag1_3Q1 lag1_2Q4 lag1_2Q3 lag1_2Q2 lag1_2Q1 lag1_1Q4 lag1_1Q3 lag1_1Q2 lag1_1Q1 lag1_0Q4 lag1_0Q3 lag1_0Q2 lag1_0Q1 lag1_9Q4 lag1_9Q3 lag1_9Q2 lag1_9Q1 lag1_8Q4 lag1_8Q3 lag1_8Q2 lag1_8Q1;
  array numC C_7Q3 C_7Q2 C_7Q1 C_6Q4 C_6Q3 C_6Q2 C_6Q1 C_5Q4 C_5Q3 C_5Q2 C_5Q1 C_4Q4 C_4Q3 C_4Q2 C_4Q1 C_3Q4 C_3Q3 C_3Q2 C_3Q1 C_2Q4 C_2Q3 C_2Q2 C_2Q1 C_1Q4 C_1Q3 C_1Q2 C_1Q1 C_0Q4 C_0Q3 C_0Q2 C_0Q1 C_9Q4 C_9Q3 C_9Q2 C_9Q1 C_8Q4 C_8Q3 C_8Q2 C_8Q1;
  do over num;
  	numlag1=lag(num);
	numC=num-numlag1;
	if MOB eq 1 then numC=.;
  end;
  C=mean(of C_:);
  keep MOB C;
run;
%mend;

%cum_corate36(A);

 


 

View solution in original post


All Replies
Solution
‎02-25-2018 03:11 PM
Super User
Posts: 23,778

Re: Array Processing; Problem calculating mean across all obs

Posted in reply to kriscrowe

That is a great example of why you SHOULD NOT store your data in a wide format. 

 

I would recommend re-writing it to transpose your data to a long format and you'll easily be able to calculate your summaries using PROC MEANS, dynamically, and then transpose back for reporting/display purposes or to meet requirements. 

 

If you want help debugging this code, you should explain what it is doing that's not what you want, because we don't know what you 'intended'. And you should include the log, after setting the following options so that messages are pasted to the log:

 

options mprint symbolgen mlogic;

kriscrowe wrote:

I've inherited some code that is not working as intended. Can someone decipher what this code is doing. It reads in excel spreadsheet and is supposed to calculate a mean over all quarters for each month on book value (1-36). I'd like to understand how it's doing this (if it is) and the reason for lag terms and initializing most recent quarter with _n. The log prints first. and last. values but I don't follow. The result is not as expected.

%macro cum_corate36(grade);
data LC_co_&grade.36; set LC_CO."&grade.36$A1:AQ37"N; run;
proc sort; by MOB; run;

data LC_co1_&grade.36; set LC_co_&grade.36; by MOB;
  _7Q3_n=_7Q3*1; 
  array num  _7Q3n _7Q2 _7Q1 _6Q4 _6Q3 _6Q2 _6Q1 _5Q4 _5Q3 _5Q2 _5Q1 _4Q4 _4Q3 _4Q2 _4Q1 _3Q4 _3Q3 _3Q2 _3Q1 _2Q4 _2Q3 _2Q2 _2Q1 _1Q4 _1Q3 _1Q2 _1Q1 _0Q4 _0Q3 _0Q2 _0Q1 _9Q4 _9Q3 _9Q2 _9Q1 _8Q4 _8Q3 _8Q2 _8Q1;
  array numlag1 lag1_7Q3 lag1_7Q2 lag1_7Q1 lag1_6Q4 lag1_6Q3 lag1_6Q2 lag1_6Q1 lag1_5Q4 lag1_5Q3 lag1_5Q2 lag1_5Q1 lag1_4Q4 lag1_4Q3 lag1_4Q2 lag1_4Q1 lag1_3Q4 lag1_3Q3 lag1_3Q2 lag1_3Q1 lag1_2Q4 lag1_2Q3 lag1_2Q2 lag1_2Q1 lag1_1Q4 lag1_1Q3 lag1_1Q2 lag1_1Q1 lag1_0Q4 lag1_0Q3 lag1_0Q2 lag1_0Q1 lag1_9Q4 lag1_9Q3 lag1_9Q2 lag1_9Q1 lag1_8Q4 lag1_8Q3 lag1_8Q2 lag1_8Q1;
  array numC C_7Q3 C_7Q2 C_7Q1 C_6Q4 C_6Q3 C_6Q2 C_6Q1 C_5Q4 C_5Q3 C_5Q2 C_5Q1 C_4Q4 C_4Q3 C_4Q2 C_4Q1 C_3Q4 C_3Q3 C_3Q2 C_3Q1 C_2Q4 C_2Q3 C_2Q2 C_2Q1 C_1Q4 C_1Q3 C_1Q2 C_1Q1 C_0Q4 C_0Q3 C_0Q2 C_0Q1 C_9Q4 C_9Q3 C_9Q2 C_9Q1 C_8Q4 C_8Q3 C_8Q2 C_8Q1;
  do over num;
  	numlag1=lag(num);
	numC=num-numlag1;
	if MOB eq 1 then numC=.;
  end;
  C=mean(of C_:);
  keep MOB C;
run;
%mend;

%cum_corate36(A);

 


 

New Contributor
Posts: 2

Re: Array Processing; Problem calculating mean across all obs

Proc transpose worked with one problem. The first occurrence of 7Q3 is zero which is causing this quarter to be missing from every MOB(month on book) group.

 

For example, I would expect to see 17Q3 in transposed results and it is missing for MOB. 17Q3 value is zero for MOB=1 but >0 for MOB=2 in original table. I think this may be why  _7Q3_n=_7Q3*1; appeared in original code. What is best fix to include 17Q3 for all MOB?

MOBqtrCOL
117Q2 
117Q1 
116Q4 
116Q3 
116Q2 
116Q1 
115Q4 
115Q3 
217Q2 
217Q1 
216Q4 
216Q3 
216Q2 
216Q1 
215Q4 
215Q3 

 

 

Super User
Posts: 23,778

Re: Array Processing; Problem calculating mean across all obs

Posted in reply to kriscrowe

What does your starting data look like?

Super User
Super User
Posts: 8,127

Re: Array Processing; Problem calculating mean across all obs

[ Edited ]
Posted in reply to kriscrowe

So the main thing the data step is doing is taking a row of data values and finding the change in value between the current row's data and the previous row's data.  It then takes the mean of those differences.  So if you had data with two columns like this:

Row  Col1 Col2
 1    10   20 
 2    20   40 
 3    10   30 

Since there are two data columns it needs to make 2*2=4 new variables to store the lags and the differences and then one more final variable to store the mean of the differences for this row.

Row  Col1 Col2  Lag1 Lag2 Dif1 Dif2 Mean
 1    10   20     .    .    .    .    .
 2    20   40    10   20   10   20   15
 3    10   30    20   40  -10  -10  -10

I am not sure what the MOB variable is and why it wants to set DIF vars to missing when MOB=1.  Perhaps MOB=1 is like the first observation in a group?  In which case the LAG values have come from a different group and so you want to make the differences be missing because they are meaningless if there is no previous value. Like the are in the first overall row (and in my example above).  They should probably also set the lag value to missing, but perhaps they don't really care about the lag value.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 220 views
  • 2 likes
  • 3 in conversation