Dear,
I am using the following code to get the avg values. I got the out put I need but please suggest any other function I can use.
Without this statement "if bp ^=. then n+1" I am getting the output "avg=55.33" . With this statement added I got the out put I need avg=83. Please suggest any other soloution. Thank you
data one;
input id bp;
datalines;
1 80
1 86
1 .
;
data two;
set one;
by id;
vsum+bp;
if bp ^=. then n+1;
if first.id then do;
n=1; vsum=bp;
end;
output;
if last.id then do;
test='AVG';
bp=vsum/n;
output;
end;
run;
So basically you want the easiest way to get the mean value of the values in BP with missing values ignored right? 🙂
If this is the case, simply do
proc means data = one mean;
var bp;
run;
yes
Look at my edited reply 🙂 That will give you avg = 83. Do you want a data step solution or does that work for you?
Thanks for the code. But I need to use my code as there are sveral other variables to be populated for this avg obs . Please suggest in my code.
eg: for my data output needed;
id bp test
1 86 db
1 80 db
1 . db
1 83 dbavg
Ah ok. In that case, @mkeintzs code is pretty solid 🙂
@knveraraju91 wrote:
Thanks for the code. But I need to use my code as there are sveral other variables to be populated for this avg obs . Please suggest in my code.
eg: for my data output needed;
id bp test
1 86 db
1 80 db
1 . db
1 83 dbavg
If the sole purpose of this is for a report you may find that this can be done in a procedure such as Proc report.
If you create data sets where you mix values with mean/total values in the same column then you should 1) have a flag on the record somewhere to indicate the row is a summary or not and 2) make sure there is some sort of warning about the data set. Otherwise that average/total or other summary value may be accidentaly used when not intended. Consider the case of you are developing your process and are using:
Data have;
set have;
<your accumulator>
run;
and then decide you need to do this with another variable. So you run:
Data have;
set have;
<your accumulator>
<new accumulator code>
run;
Your accumulated value from the first pass has now been used to create a new accumulated value.
And yes, I have seen this exact thing happen in real world data and had to reconcile why final results didn't match intermediate results.
Of course, this is what proc means/proc summary is design to do for you.
But if you want to do it in a data step, just don't read in cases with missing bp values. Then you can automatically count all the other vars:
data two;
set one;
by id;
where bp^=.;
if first.id then vsum=0;
vsum+bp;
if last.id then do;
bp=vsum/ ifn(lag(_N_)=.,_n_,dif(_n_));
test='avg';
output;
end;
run;
Notes:
Editted addition: I didn't realize you wanted to keep the individual observations as well as the average, in which case:
data two (drop=vsum);
set one;
by id;
where bp^=.;
output;
if first.id then vsum=0;
vsum+bp;
if last.id then do;
bp=vsum/ifn(lag(_N_)=.,_n_,dif(_n_));
test=cats('AVG',test); *generates "AVGDB" if the test var is long enough and test="DB"*;
output;
end;
run;
No this is not particularly good if your are getting the average of multiple variables, which might have differing patterns of missing values.
Hi,
Your code working but it gave output with last OBS only. I added output statement in the code, but it missed the OBS with missing value.
data two;
set one;
by id;
where bp^=.;
if first.id then vsum=0;
vsum+bp;
output;
if last.id then do;
bp=vsum/ ifn(lag(_N_)=.,_n_,dif(_n_));
test='avg';
output;
end;
run;
output getting:
id bp test
1 86 db
1 80 db
1 83 dbavg
output needed;
id bp test
1 86 db
1 80 db
1 . db
1 83 dbavg
Ah you want to keep the missing observations too. Then you'll probably have to revert to explicit tracking of the N value.
data two (drop=vsum n);
do until (last.id);
set one;
by id;
output;
vsum=sum(vsum,bp);
n=sum(n,n(bp));
end;
test=cats('AVG',test);
bp=vsum/n;
output;
run;
The N function, and is N(bp) returns a zero if bp is missing, a one otherwise. It counts the number of non-missing values in the argument - as in N(x,y,z).
Holding skill would be help.
data one;
input id bp;
datalines;
1 80
1 86
1 .
2 80
2 86
2 .
;
data two;
set one;
by id;
array x{99999} _temporary_;
if first.id then do; n=0; call missing(of x{*});end;
n+1;
x{n}=bp;
output;
if last.id then do;
test='AVG';
bp=mean(of x{*});
output;
end;
run;
Yes, ... but ...
Every calculation requires clearing 99999 values ("call missing of x{*}"), and later scanning 99999 values ("mean(of x{*}") - even if 98999 of them are missing. With a large dataset, where the 99999 is used as protection against a large id group, it can cost in comparison to keeping a running sum.
I think the use the temporary arrays like this is probably more advisable when complex manipulation of the values would be needed, say like finding quantiles.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.