BookmarkSubscribeRSS Feed
knveraraju91
Barite | Level 11

Dear,

 

I am using the following code to get the avg values. I got the out put I need but please suggest any other function I can use.

 

Without this statement   "if bp ^=. then n+1" I am getting the output "avg=55.33" . With this statement added I got the out put I need avg=83. Please suggest any other soloution. Thank you

 

data one;
input id bp;
datalines;
1 80
1 86
1 .
;

 

data two;
set one;
by id;

vsum+bp;
if bp ^=. then n+1;
if first.id then do;
n=1; vsum=bp;
end;
output;
if last.id then do;
test='AVG';
bp=vsum/n;

output;
end;
run;

11 REPLIES 11
PeterClemmensen
Tourmaline | Level 20

So basically you want the easiest way to get the mean value of the values in BP with missing values ignored right? 🙂

 

If this is the case, simply do

 

proc means data = one mean;
   var bp;
run;
PeterClemmensen
Tourmaline | Level 20

Look at my edited reply 🙂 That will give you avg = 83. Do you want a data step solution or does that work for you?

knveraraju91
Barite | Level 11

Thanks for the code. But I need to use my code as there are sveral other variables to be populated for this avg obs . Please suggest in my code.

 

eg: for my data output needed;

 

id    bp          test

1     86           db

1     80          db

1      .            db

1      83          dbavg

PeterClemmensen
Tourmaline | Level 20

Ah ok. In that case, @mkeintzs code is pretty solid 🙂

ballardw
Super User

@knveraraju91 wrote:

Thanks for the code. But I need to use my code as there are sveral other variables to be populated for this avg obs . Please suggest in my code.

 

eg: for my data output needed;

 

id    bp          test

1     86           db

1     80          db

1      .            db

1      83          dbavg


If the sole purpose of this is for a report you may find that this can be done in a procedure such as Proc report.

If you create data sets where you mix values with mean/total values in the same column then you should 1) have a flag on the record somewhere to indicate the row is a summary or not and 2) make sure there is some sort of warning about the data set. Otherwise that average/total or other summary value may be accidentaly used when not intended. Consider the case of you are developing your process and are using:

 

Data have;

    set have;

<your accumulator>

run;

and then decide you need to do this with another variable. So you run:

Data have;

    set have;

<your accumulator>

<new accumulator code>

run;

 

Your accumulated value from the first pass has now been used to create a new accumulated value.

And yes, I have seen this exact thing happen in real world data and had to reconcile why final results didn't match intermediate results.

mkeintz
PROC Star

Of course, this is what proc means/proc summary is design to do for you.

 

But if you want to do it in a data step, just don't read in cases with missing bp values.  Then you can automatically count all the other vars:

 

data two;
   set one;
   by id;
   where bp^=.;

   if first.id then vsum=0;
   vsum+bp;

   if last.id then do;
      bp=vsum/ ifn(lag(_N_)=.,_n_,dif(_n_));
      test='avg';
      output;
   end;
run;

 

Notes:

  1. There is no need to actually generate an N variables for the divisor.  Instead, at the end of each ID group, you can use the function
        ifn(lag(_N_)=.,_N_,dif(_N_))
    which says if the first argument is true, return the 2nd arg, otherwsie return the 3rd arg. 

    1. The 1st arg    "lag(_N_)=."  asks whether this is effectively the first time the lag queue has been updated (in which case the lagged value of _N_ is missing.
    2. If this is the end of the first id group, the second argument returns the current observation number (counting only those that passed the WHERE test).
    3. But if this is the end of later id groups, the third argument returns the difference between the current _N_ and the last time the dif function was executed.  Since these are executed only at the end of each id group, the dif function is the number of observations between the end of the last id group and the curent id group.

Editted addition:  I didn't realize you wanted to keep the individual observations as well as the average, in which case:

 

data two (drop=vsum);

  set one;

  by id;

  where bp^=.;

  output;

 

  if first.id then vsum=0;

  vsum+bp;

 

  if last.id then do;

    bp=vsum/ifn(lag(_N_)=.,_n_,dif(_n_));

    test=cats('AVG',test);  *generates "AVGDB" if the test var is long enough and test="DB"*;

    output;

 end;

run;

 

No this is not particularly good if your are getting the average of multiple variables, which might have differing patterns of missing values.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
knveraraju91
Barite | Level 11

Hi,

 

Your code working but it gave output with  last OBS only. I added output statement in the code, but it missed the OBS with missing value.

 

data two;
set one;
by id;
where bp^=.;

if first.id then vsum=0;
vsum+bp;
output;

if last.id then do;
bp=vsum/ ifn(lag(_N_)=.,_n_,dif(_n_));
test='avg';
output;
end;
run;

 

output getting:

 

id    bp          test

1     86           db

1     80          db

1      83          dbavg

 

output needed;

 

id    bp          test

1     86           db

1     80          db

1      .            db

1      83          dbavg

 

 

mkeintz
PROC Star

Ah you want to keep the missing observations too.  Then you'll probably have to revert to explicit tracking of the N value.

 

data two (drop=vsum n);
  do until (last.id);
    set one;
    by id;
    output;
    vsum=sum(vsum,bp);
    n=sum(n,n(bp));
 end;

 test=cats('AVG',test);
 bp=vsum/n;
 output;
run;

 

 

The N function, and is N(bp) returns a zero if bp is missing, a one otherwise.  It counts the number of non-missing values in the argument - as in    N(x,y,z).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

Holding skill would be help.

 

data one;
input id bp;
datalines;
1 80
1 86
1 .
2 80
2 86
2 .
;
 
data two;
set one;
by id;
array x{99999} _temporary_;
if first.id then do; n=0; call missing(of x{*});end;
n+1;
x{n}=bp;
output;
if last.id then do;
 test='AVG';
 bp=mean(of x{*});
 output;
end;
run;
mkeintz
PROC Star

Yes, ... but ...

 

Every calculation requires clearing 99999 values ("call missing of x{*}"), and later scanning 99999 values ("mean(of x{*}") - even if 98999 of them are missing.  With a large dataset, where the 99999 is used as protection against a large id group, it can cost in comparison to keeping a running sum.

 

I think the use the temporary arrays like this is probably more advisable when complex manipulation of the values would be needed, say like finding quantiles.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2122 views
  • 4 likes
  • 5 in conversation