BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jak1351
Fluorite | Level 6

Greetings!

 

I am trying to output average of a variable within a conditional do loop.

 

I have a dataset with 5 variables visitID, msdrg, diagnosis, cost1 cost2

 

My do loop is that if if the msdrg and diagnosis is the same, then new cost=average of all the cost (within that same msdrg and diagnosis).

 

My program is the following:


data want;
set have;
by DiagnosisDSC MSDRGWeightNBR;
if first.MSDRGWeightNBR and first.DiagnosisDSC then do;
new_cost=mean(cost1);
end;
run;

 

I know there is no "mean" function within a data statement and I will have to proc means but how do I do it within an ongoing do loop.

The above program returns the new_cost as the first value of cost1 for all the records where the msdrg and diagnosis is the same and blank for the others.

Whereas I want to have the average of all the cost1 (5 or 6 or however many observations) where msdrg and diagnosis are the same.

 

Thanks so much!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The MEAN(,) function takes the mean of a list of values. So MEAN(1,2,3) would return 2.  You cannot use it to get the mean over a series of observations.

 

You should just use PROC MEANS and re-merge.  

proc means noprint nway data=have;
  by DiagnosisDSC MSDRGWeightNBR;
  var cost1 ;
  output out=means(drop=_type_ _freq_) mean=new_cost;
run;
data want;
  merge have means;
  by DiagnosisDSC MSDRGWeightNBR;
run;

Or you could use PROC SQL and let it do the re-merge for you. Note that the MEAN() aggregate function in the SQL language is a totally different function than the MEAN(,...) function of the SAS language.

proc sql ;
create table want as
  select *, mean(cost1) as new_cost
  from have
  group by DiagnosisDSC , MSDRGWeightNBR
;
quit;

 

View solution in original post

6 REPLIES 6
Reeza
Super User
It sounds like you either want to replace missing with the mean values, is that correct?

If so, PROC STDIZE has a 'trick' that basically does it for you.

This blog post covers how to do it.
http://sasnrd.com/sas-replace-missing-values-mean/

Use the variables that create a group in your BY statement in the PROC STDIZE code as well.
jak1351
Fluorite | Level 6

Thanks for reply. My program is either giving me the first cost value for the same msdrg and diagnosis and giving me blanks for the rest of the costs for the same msdrg and diagnosis. 

 

For example:

 

MSDRG     DIAGNOSIS    COST1       NEW_COST

1.8                fracture         6801              6801

1.8                fracture         6801               ...

1.8                fracture         6801               ...

1.8                fracture         6801               ...

1.8                fracture         6801               ...

 

And the same for other same msdrg and diagnosis.

 

While I want the result to be 6801/5..

 

Does it make sense?

Reeza
Super User
Show what you have, what you want and your code ( and log if it has warnings or errors). I have no idea what method you used so no idea if it's correct or not.
jak1351
Fluorite | Level 6

I am sorry. Here is my dataset  "Have"

 

 

Obs            MSDRG          DIAGNOSIS    COST1      

  1                    1.8                fracture           1000

  2                    1.8                fracture           2000

  3                    1.8                fracture           3000

  4                    1.8                fracture           4000

  5                    1.8                fracture           5000

  6                    3.8                diabetes         1000

  7                    3.8                diabetes         2000

  8                    3.8                diabetes         3000

  9                    3.8                diabetes         4000

 10                   3.8                diabetes         5000

 

Data want:

 

Obs            MSDRG          DIAGNOSIS    COST1       Cost_new(average)

  1                    1.8                fracture           1000           3000 (average of same drg and diagnosis(1000+2000+3000+4000+5000)/5

  2                    1.8                fracture           2000

  3                    1.8                fracture           3000

  4                    1.8                fracture           4000

  5                    1.8                fracture           5000

  6                    3.8                diabetes         5000          3800(average of same drg and diagnosis(5000+2000+3000+4000+5000)/5

  7                    3.8                diabetes         2000

  8                    3.8                diabetes         3000

  9                    3.8                diabetes         4000

 10                   3.8                diabetes         5000

 

Code tried using:

data want;
set margin_match;
by DiagnosisDSC MSDRGWeightNBR;
if first.MSDRGWeightNBR and first.DiagnosisDSC then do;
DirectMarginAMT_bwh=mean(DirectMarginAMT);
end;
run;

But did not work. as mean is used in proc means...

No warning or error in log but the output is this:

 

Obs            MSDRG          DIAGNOSIS    COST1      cost_new(average)

  1                    1.8                fracture           1000         1000

  2                    1.8                fracture           2000           ......

  3                    1.8                fracture           3000           ......

  4                    1.8                fracture           4000           ......

  5                    1.8                fracture           5000           ......

  6                    3.8                diabetes         1000           1000

  7                    3.8                diabetes         2000           ......

  8                    3.8                diabetes         3000           .......

  9                    3.8                diabetes         4000           ......

 10                   3.8                diabetes         5000           .....

 

Thanks!

Tom
Super User Tom
Super User

The MEAN(,) function takes the mean of a list of values. So MEAN(1,2,3) would return 2.  You cannot use it to get the mean over a series of observations.

 

You should just use PROC MEANS and re-merge.  

proc means noprint nway data=have;
  by DiagnosisDSC MSDRGWeightNBR;
  var cost1 ;
  output out=means(drop=_type_ _freq_) mean=new_cost;
run;
data want;
  merge have means;
  by DiagnosisDSC MSDRGWeightNBR;
run;

Or you could use PROC SQL and let it do the re-merge for you. Note that the MEAN() aggregate function in the SQL language is a totally different function than the MEAN(,...) function of the SAS language.

proc sql ;
create table want as
  select *, mean(cost1) as new_cost
  from have
  group by DiagnosisDSC , MSDRGWeightNBR
;
quit;

 

jak1351
Fluorite | Level 6

Thank you so much. I tried proc sql and it worked!!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1703 views
  • 1 like
  • 3 in conversation