Average of first 12 non-missing values vertically.
X | Y0 | Y1 | Y2 | Y3 | Y4 | Y5 | Y6 | Y7 | Y8 | Y9 | Y10 |
201201 | 1% | 0% | 1% | 0% | 0% | (0%) | . | . | . | . | . |
201202 | 3% | (1%) | 2% | 1% | 0% | 0% | 1% | . | . | . | . |
201203 | 1% | 0% | 1% | 0% | (0%) | 0% | 0% | 0% | . | . | . |
201204 | 2% | 1% | 0% | 1% | 2% | 0% | (0%) | (0%) | 1% | . | . |
201205 | 1% | 0% | 0% | 2% | (0%) | 0% | 1% | 0% | 3% | 1% | . |
201206 | 0% | 1% | 1% | 2% | 0% | (0%) | 0% | (0%) | 1% | 3% | 2% |
201207 | 1% | 2% | (0%) | (0%) | 0% | 0% | 3% | 0% | 2% | (0%) | 0% |
201208 | 1% | (0%) | 0% | 1% | 1% | 0% | 2% | 2% | (0%) | 1% | 3% |
201209 | 1% | 1% | (0%) | 3% | 0% | 2% | 1% | (0%) | 2% | 3% | 2% |
201210 | 2% | (0%) | (0%) | 1% | 4% | 3% | 2% | 1% | 4% | 4% | 0% |
201211 | 1% | 2% | 1% | (0%) | 0% | (0%) | 0% | 3% | 0% | 0% | 6% |
201212 | 2% | 0% | 1% | (0%) | 1% | 6% | 0% | 2% | 3% | (0%) | (0%) |
201301 | 2% | 2% | 0% | (0%) | 0% | 0% | 0% | (0%) | 2% | (0%) | 1% |
201302 | 0% | 0% | 1% | 0% | 0% | 1% | (0%) | 1% | 5% | (1%) | 1% |
201303 | 1% | 1% | 0% | 1% | 0% | 1% | 0% | 1% | (0%) | 1% | 4% |
201304 | 0% | 4% | 1% | 1% | 1% | 3% | 2% | 4% | (1%) | 1% | 4% |
AVG | 0.012921 | 0.004853 | 0.005452 | 0.008107 | 0.006915 | 0.009627 | 0.008207 | 0.007293 | 0.018898 | 0.011633 |
This would be much easier if your data is in a long data set instead of a wide data set (wide data sets are rarely required and much harder to program).
As an example (to save myself some typing) here I get the average for the first THREE non-missing records. Note, this code will work for how ever many Y variables you may have, and with minor modifications will work for any number of non-missing records.
data have;
input x y1 y2;
cards;
201201 0.1 .
201202 0.3 .
201203 -0.2 0.2
201204 0 0.25
201205 0.1 0.4
201206 -0.1 -0.1
;
/* CONVERT TO A LONG DATA SET */
proc transpose data=have out=have_t;
by x;
var y:;
run;
proc sort data=have_t;
by _name_ x;
run;
data want;
set have_t;
by _name_;
if first._name_ then do;
sum=0;
counter=0;
end;
if not missing(col1) then do;
counter+1;
if counter<=3 then sum+col1;
end;
if last._name_ then do;
mean=sum/3;
output;
end;
run;
its giving 0 for all sum ,counter and mean.
missing value for Col1..
when i use mean=sum/12
@Cho8 wrote:
its giving 0 for all sum ,counter and mean.
missing value for Col1..
when i use mean=sum/12
It does not give 0 when I run it. You must not be running the same code.
Maxims 19 & 33. As soon as you transpose to a long layout, the code almost writes itself.
If you want to average percentages you better have both the numerator and denominator used to calculate the percentage as unless the denominator is exactly the same for every percentage then the "average" is suspect at best a possibly terribly misleading at worst.
Suppose I drive a car 1 mile and get 100 miles per gallon because I was able to do a lot of coasting. Then the next leg measure was 100 miles and I get 30 mpg because of more typical driving conditions. Is the "averager" mgp (100+30)/2 = 65 mpg?
Even worse, I strongly suspect the values you show are rounded, which even further reduces the reliability of the "average".
For each variable, you want to calculate the average of the first 12 non-missing values. This program (untested since you haven't generated your sample data in the form of a working data step) is one approach using the data in its current wide format:
data want (drop=i);
set have end=end_of_have;
array tmpsum {10} _temporary_;
array tmpn {10} _temporary_;
array y {*} y: ;
do i=1 to 10;
if y{i}=. or tmpn{i}=12 then continue;
tmpn{i}+1;
tmpsum{i}+y{i};
end;
output;
if end_of_have;
do i=1 to 10;
y{i}=tmpsum{i}/12;
end;
output;
run;
The tmpsum and tmpn array keep track of sums and counts of the 10 Y variables. For each iteration of the loop the CONTINUE statement means to go to the next iteration without performing subsequent within-loop statements. The provides a way to only add the first 12 values of each Y in the corresponding tmpsum/tmpn arrays.
There are two explicit OUTPUT statements - one for each obs in the data set, and one which is performed only at the end of data set HAVE, which writes on the desired average values.
where do u print the vertical average for each variable
So now you want to PRINT the results? My program (did you actually run it and examine the resulting dataset?) appends an observation to the end of the dataset, with vars Y1-Y10 containing the desired average.
So run a proc print on that dataset.
Is it what you are looking for ?
data have; infile datalines expandtabs truncover; input y (d1-d9) (: percent8.); datalines; 201201 1% 0% 1% 0% 0% (0%) . . . . . 201202 3% (1%) 2% 1% 0% 0% 1% . . . . 201203 1% 0% 1% 0% (0%) 0% 0% 0% . . . 201204 2% 1% 0% 1% 2% 0% (0%) (0%) 1% . . 201205 1% 0% 0% 2% (0%) 0% 1% 0% 3% 1% . 201206 0% 1% 1% 2% 0% (0%) 0% (0%) 1% 3% 2% 201207 1% 2% (0%) (0%) 0% 0% 3% 0% 2% (0%) 0% 201208 1% (0%) 0% 1% 1% 0% 2% 2% (0%) 1% 3% 201209 1% 1% (0%) 3% 0% 2% 1% (0%) 2% 3% 2% 201210 2% (0%) (0%) 1% 4% 3% 2% 1% 4% 4% 0% 201211 1% 2% 1% (0%) 0% (0%) 0% 3% 0% 0% 6% 201212 2% 0% 1% (0%) 1% 6% 0% 2% 3% (0%) (0%) 201301 2% 2% 0% (0%) 0% 0% 0% (0%) 2% (0%) 1% 201302 0% 0% 1% 0% 0% 1% (0%) 1% 5% (1%) 1% 201303 1% 1% 0% 1% 0% 1% 0% 1% (0%) 1% 4% 201304 0% 4% 1% 1% 1% 3% 2% 4% ; proc transpose data=have out=temp name=y prefix=_; var d1-d9; id y; run; data temp; set temp; array x{*} _:; count=0; do i=1 to dim(x); if not missing(x{i}) then do; count+1; sum=sum(sum,x{i}); if count=12 then leave; end; end; _avg=sum/count; drop i count sum; run; proc transpose data=temp out=want; var _:; id y; run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.