Vertical Average by condition

Cho8 · Posted 11-09-2020 09:43 AM

Average of first 12 non-missing values vertically.

X	Y0	Y1	Y2	Y3	Y4	Y5	Y6	Y7	Y8	Y9	Y10
201201	1%	0%	1%	0%	0%	(0%)	.	.	.	.	.
201202	3%	(1%)	2%	1%	0%	0%	1%	.	.	.	.
201203	1%	0%	1%	0%	(0%)	0%	0%	0%	.	.	.
201204	2%	1%	0%	1%	2%	0%	(0%)	(0%)	1%	.	.
201205	1%	0%	0%	2%	(0%)	0%	1%	0%	3%	1%	.
201206	0%	1%	1%	2%	0%	(0%)	0%	(0%)	1%	3%	2%
201207	1%	2%	(0%)	(0%)	0%	0%	3%	0%	2%	(0%)	0%
201208	1%	(0%)	0%	1%	1%	0%	2%	2%	(0%)	1%	3%
201209	1%	1%	(0%)	3%	0%	2%	1%	(0%)	2%	3%	2%
201210	2%	(0%)	(0%)	1%	4%	3%	2%	1%	4%	4%	0%
201211	1%	2%	1%	(0%)	0%	(0%)	0%	3%	0%	0%	6%
201212	2%	0%	1%	(0%)	1%	6%	0%	2%	3%	(0%)	(0%)
201301	2%	2%	0%	(0%)	0%	0%	0%	(0%)	2%	(0%)	1%
201302	0%	0%	1%	0%	0%	1%	(0%)	1%	5%	(1%)	1%
201303	1%	1%	0%	1%	0%	1%	0%	1%	(0%)	1%	4%
201304	0%	4%	1%	1%	1%	3%	2%	4%	(1%)	1%	4%
AVG	0.012921	0.004853	0.005452	0.008107	0.006915	0.009627	0.008207	0.007293	0.018898	0.011633

PaigeMiller · Posted 11-09-2020 10:07 AM

This would be much easier if your data is in a long data set instead of a wide data set (wide data sets are rarely required and much harder to program).

As an example (to save myself some typing) here I get the average for the first THREE non-missing records. Note, this code will work for how ever many Y variables you may have, and with minor modifications will work for any number of non-missing records.

data have;
    input x y1 y2;
	cards;
	201201 0.1 .
	201202 0.3 .
	201203 -0.2 0.2
	201204 0 0.25
	201205 0.1 0.4
	201206 -0.1 -0.1
;

/* CONVERT TO A LONG DATA SET */
proc transpose data=have out=have_t;
	by x;
	var y:;
run;
proc sort data=have_t;
	by _name_ x;
run;
data want;
	set have_t;
	by _name_;
	if first._name_ then do;
		sum=0;
		counter=0;
	end;
	if not missing(col1) then do;
		counter+1;
		if counter<=3 then sum+col1;
	end;
	if last._name_ then do;
		mean=sum/3;
		output;
	end;
run;

--
Paige Miller

Cho8 · Posted 11-10-2020 01:18 AM

its giving 0 for all sum ,counter and mean.

missing value for Col1..

when i use mean=sum/12

PaigeMiller · Posted 11-10-2020 06:30 AM

@Cho8 wrote:

its giving 0 for all sum ,counter and mean.

missing value for Col1..

when i use mean=sum/12

It does not give 0 when I run it. You must not be running the same code.

--
Paige Miller

Kurt_Bremser · Posted 11-09-2020 10:16 AM

Maxims 19 & 33. As soon as you transpose to a long layout, the code almost writes itself.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 11-09-2020 10:19 AM

If you want to average percentages you better have both the numerator and denominator used to calculate the percentage as unless the denominator is exactly the same for every percentage then the "average" is suspect at best a possibly terribly misleading at worst.

Suppose I drive a car 1 mile and get 100 miles per gallon because I was able to do a lot of coasting. Then the next leg measure was 100 miles and I get 30 mpg because of more typical driving conditions. Is the "averager" mgp (100+30)/2 = 65 mpg?

Even worse, I strongly suspect the values you show are rounded, which even further reduces the reliability of the "average".

mkeintz · Posted 11-10-2020 01:05 AM

For each variable, you want to calculate the average of the first 12 non-missing values. This program (untested since you haven't generated your sample data in the form of a working data step) is one approach using the data in its current wide format:

data want (drop=i);
  set have end=end_of_have;
  array tmpsum {10} _temporary_;
  array tmpn   {10} _temporary_;

  array y {*} y: ;

  do i=1 to 10;
    if y{i}=. or tmpn{i}=12 then continue;
    tmpn{i}+1;
	tmpsum{i}+y{i};
  end;
  output;
  if end_of_have;
  do i=1 to 10;
    y{i}=tmpsum{i}/12;
  end;
  output;
run;

The tmpsum and tmpn array keep track of sums and counts of the 10 Y variables. For each iteration of the loop the CONTINUE statement means to go to the next iteration without performing subsequent within-loop statements. The provides a way to only add the first 12 values of each Y in the corresponding tmpsum/tmpn arrays.

There are two explicit OUTPUT statements - one for each obs in the data set, and one which is performed only at the end of data set HAVE, which writes on the desired average values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Cho8 · Posted 11-10-2020 02:59 AM

where do u print the vertical average for each variable

mkeintz · Posted 11-10-2020 08:19 PM

So now you want to PRINT the results? My program (did you actually run it and examine the resulting dataset?) appends an observation to the end of the dataset, with vars Y1-Y10 containing the desired average.

So run a proc print on that dataset.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 11-10-2020 07:46 AM

Is it what you are looking for ?

data have;
infile datalines expandtabs truncover;
input y  (d1-d9) (: percent8.);
datalines;
201201	1%	0%	1%	0%	0%	(0%)	.	.	.	.	.
201202	3%	(1%)	2%	1%	0%	0%	1%	.	.	.	.
201203	1%	0%	1%	0%	(0%)	0%	0%	0%	.	.	.
201204	2%	1%	0%	1%	2%	0%	(0%)	(0%)	1%	.	.
201205	1%	0%	0%	2%	(0%)	0%	1%	0%	3%	1%	.
201206	0%	1%	1%	2%	0%	(0%)	0%	(0%)	1%	3%	2%
201207	1%	2%	(0%)	(0%)	0%	0%	3%	0%	2%	(0%)	0%
201208	1%	(0%)	0%	1%	1%	0%	2%	2%	(0%)	1%	3%
201209	1%	1%	(0%)	3%	0%	2%	1%	(0%)	2%	3%	2%
201210	2%	(0%)	(0%)	1%	4%	3%	2%	1%	4%	4%	0%
201211	1%	2%	1%	(0%)	0%	(0%)	0%	3%	0%	0%	6%
201212	2%	0%	1%	(0%)	1%	6%	0%	2%	3%	(0%)	(0%)
201301	2%	2%	0%	(0%)	0%	0%	0%	(0%)	2%	(0%)	1%
201302	0%	0%	1%	0%	0%	1%	(0%)	1%	5%	(1%)	1%
201303	1%	1%	0%	1%	0%	1%	0%	1%	(0%)	1%	4%
201304	0%	4%	1%	1%	1%	3%	2%	4%
;

proc transpose data=have out=temp name=y prefix=_;
var d1-d9;
id y;
run;

data temp;
 set temp;
 array x{*} _:;
 count=0;
 do i=1 to dim(x);
  if not missing(x{i})  then do;
  count+1;
  sum=sum(sum,x{i});
  if count=12 then leave;
 end;
 end;
 _avg=sum/count;
 drop i  count sum;
run;

proc transpose data=temp out=want;
var _:;
id y;
run;

Vertical Average by condition

Re: Vertical Average by condition

Re: Vertical Average by condition

Re: Vertical Average by condition

Re: Vertical Average by condition

Re: Vertical Average by condition

Re: Vertical Average by condition

Re: Vertical Average by condition

Re: Vertical Average by condition

Re: Vertical Average by condition

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away