DATA Step, Macro, Functions and more

Column totals

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 123
Accepted Solution

Column totals

I am attempting to create column totals for a few variables in a dataset. I want to store these totals in separate variables so I can use them in a new data step. I understand proc print has a sum statement that computes the total, but it doesn't store that total in a variable.


Accepted Solutions
Solution
‎03-08-2016 01:46 PM
Trusted Advisor
Posts: 1,118

Re: Column totals

[ Edited ]
Posted in reply to JediApprentice

Hi @JediApprentice,

 

PROC SUMMARY is more suitable for this purpose than PROC PRINT.

 

Edit:

Example:

proc summary data=sashelp.class;
var age weight height;
output out=totals sum=;
run;

The above step computes the totals of variables AGE, WEIGHT and HEIGHT (as listed in the VAR statement) across all observations in dataset SASHELP.CLASS and stores the results in variables of the same names in a dataset which I named TOTALS (see OUTPUT statement). The totals are requested by the SUM= option of the OUTPUT statement. (After the equals sign you could specify different variable names for the totals, if desired, e.g., sum=tot_age tot_wgt tot_hgt.)

View solution in original post


All Replies
Solution
‎03-08-2016 01:46 PM
Trusted Advisor
Posts: 1,118

Re: Column totals

[ Edited ]
Posted in reply to JediApprentice

Hi @JediApprentice,

 

PROC SUMMARY is more suitable for this purpose than PROC PRINT.

 

Edit:

Example:

proc summary data=sashelp.class;
var age weight height;
output out=totals sum=;
run;

The above step computes the totals of variables AGE, WEIGHT and HEIGHT (as listed in the VAR statement) across all observations in dataset SASHELP.CLASS and stores the results in variables of the same names in a dataset which I named TOTALS (see OUTPUT statement). The totals are requested by the SUM= option of the OUTPUT statement. (After the equals sign you could specify different variable names for the totals, if desired, e.g., sum=tot_age tot_wgt tot_hgt.)

Frequent Contributor
Posts: 123

Re: Column totals

Posted in reply to FreelanceReinhard

Thank you, this was very helpful. I have one more question going off of this. Take a look at my code:

data Total_Savings_2015_prep;
  set EGTASK.FL_INSTATE_SAVINGS_2015;
  if SPECIALTY = 'N' and BRAND_GENERIC = 'Brand' and ApprovedPriceType = 'AWP';
run;

proc summary data=Total_Savings_2015_prep;
  var SUM_OF_APPROVEDINGREDIENTCOST SUM_OF_ADFA SUM_OF_RX;
  output out=totals_sum_2015 sum=ING DF RX;
run;

proc summary data=Total_Savings_2015_prep;
  var AWP_Rate ApprovedDispensingFeeAmount;
  output out=totals_mean_2015 mean=Avg_AWP_Rate Avg_ADFA;
run;

data Total_Savings_2015;
  set totals_sum_2015; 
  set totals_mean_2015;
run;

You'll notice in the first data step, I am subsetting for certain values of different variables. I would like these variables to show up in the final dataset so that my output would look something like this:

 

SPECIALTY      BRAND_GENERIC      ApprovedPriceType      ING        DF        RX      Avg_AWP_Rate      Avg_ADFA

        N                        Brand                              AWP             $48,923     $2.00      32               18%                     $2.00

 

The data in blue is what I already have in that last dataset.

Trusted Advisor
Posts: 1,118

Re: Column totals

Posted in reply to JediApprentice

You're welcome.

 

You could create Total_Savings_2015 (incl. the desired additional variables) in one step from EGTASK.FL_INSTATE_SAVINGS_2015:

proc summary data=EGTASK.FL_INSTATE_SAVINGS_2015;
  where SPECIALTY = 'N' and BRAND_GENERIC = 'Brand' and ApprovedPriceType = 'AWP';
  id SPECIALTY BRAND_GENERIC ApprovedPriceType;
  var SUM_OF_APPROVEDINGREDIENTCOST SUM_OF_ADFA SUM_OF_RX AWP_Rate ApprovedDispensingFeeAmount;
  output out=Total_Savings_2015(drop=_:)
         sum(SUM_OF_APPROVEDINGREDIENTCOST SUM_OF_ADFA SUM_OF_RX)=ING DF RX
         mean(AWP_Rate ApprovedDispensingFeeAmount)=Avg_AWP_Rate Avg_ADFA;
run;

The WHERE statement restricts the summary statistics to the subset of observations which you called Total_Savings_2015_prep. The ID statement brings in the additional variables (actually the "maximum values" of these, but this is necessarily the single combination selected by the WHERE statement). The VAR statement lists all analysis variables. They are assigned to the two different statistics in the OUTPUT statement. Finally, the DROP= dataset option drops all variables whose names start with an underscore, i.e. the two default variables _TYPE_ and _FREQ_, from the output dataset (assuming you don't need them).

Frequent Contributor
Posts: 123

Re: Column totals

Posted in reply to FreelanceReinhard

Say I have multiple proc summaries with different criteria in the where statement to compute, like this:

proc summary data=EGTASK.FL_INSTATE_SAVINGS_2015;
  where Business_Line = 'Commercial' and Channel = 'Retail' and CLIENT = 'FL' and NETWORK = 'FL INSTATE' and Specialty_Flag = 'Non-Specialty' and BRAND_GENERIC = 'Brand' and ApprovedPriceType = 'AWP';
  id Business_Line Channel CLIENT NETWORK Specialty_Flag BRAND_GENERIC ApprovedPriceType;
  var ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX AWP_Rate ApprovedDispensingFeeAmount;
  output out=Total_Savings_FL_2015(drop=_:)
         sum(ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX)=ING_15 DF_15 RX_15
         mean(AWP_Rate ApprovedDispensingFeeAmount)=Avg_AWP_Rate_15 Avg_ADFA_15;
run;

proc summary data=EGTASK.FL_INSTATE_SAVINGS_2015;
  where Business_Line = 'Commercial' and Channel = 'Retail' and CLIENT = 'FL' and NETWORK = 'FL INSTATE' and Specialty_Flag = 'Non-Specialty' and BRAND_GENERIC = 'Generic' and ApprovedPriceType = 'AWP';
  id Business_Line Channel CLIENT NETWORK Specialty_Flag BRAND_GENERIC ApprovedPriceType;
  var ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX AWP_Rate ApprovedDispensingFeeAmount;
  output out=Total_Savings_FL_2015_2(drop=_:)
         sum(ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX)=ING_15 DF_15 RX_15
         mean(AWP_Rate ApprovedDispensingFeeAmount)=Avg_AWP_Rate_15 Avg_ADFA_15;
run;

proc summary data=EGTASK.FL_INSTATE_SAVINGS_2015;
  where Business_Line = 'Commercial' and Channel = 'Retail' and CLIENT = 'FL' and NETWORK = 'FL INSTATE' and Specialty_Flag = 'Non-Specialty' and BRAND_GENERIC = 'Generic' and index(ApprovedPriceType,'MAC') > 0;
  id Business_Line Channel CLIENT NETWORK Specialty_Flag BRAND_GENERIC ApprovedPriceType;
  var ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX AWP_Rate ApprovedDispensingFeeAmount;
  output out=Total_Savings_FL_2015_3(drop=_:)
         sum(ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX)=ING_15 DF_15 RX_15
         mean(AWP_Rate ApprovedDispensingFeeAmount)=Avg_AWP_Rate_15 Avg_ADFA_15;
run;

Would there be a more efficient way to cycle through the different cases (such as using a macro or a loop)? 

Trusted Advisor
Posts: 1,118

Re: Column totals

Posted in reply to JediApprentice

Hi @JediApprentice,

 

Yes, sure. You could use a macro to make the varying parts of the program code flexible. I'd suggest to make also the year flexible (so you can reuse the code until 2099 :-)).

 

%macro summ(yr=, crit=, suffix=);
proc summary data=EGTASK.FL_INSTATE_SAVINGS_20&yr;
  where Business_Line = 'Commercial' and Channel = 'Retail' and CLIENT = 'FL' and NETWORK = 'FL INSTATE' and Specialty_Flag = 'Non-Specialty' and &crit;
  id Business_Line Channel CLIENT NETWORK Specialty_Flag BRAND_GENERIC ApprovedPriceType;
  var ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX AWP_Rate ApprovedDispensingFeeAmount;
  output out=Total_Savings_FL_20&yr.&suffix(drop=_:)
         sum(ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX)=ING_&yr DF_&yr RX_&yr
         mean(AWP_Rate ApprovedDispensingFeeAmount)=Avg_AWP_Rate_&yr Avg_ADFA_&yr;
run;
%mend summ;

 

To execute the original three PROC SUMMARY steps, you just call the macro with the appropriate parameters:

%summ(yr=15, crit=BRAND_GENERIC = 'Brand' and ApprovedPriceType = 'AWP')
%summ(yr=15, crit=BRAND_GENERIC = 'Generic' and ApprovedPriceType = 'AWP', suffix=_2)
%summ(yr=15, crit=BRAND_GENERIC = 'Generic' and index(ApprovedPriceType,'MAC'), suffix=_3)

The first macro call doesn't specify parameter SUFFIX. As no default value for this parameter has been defined, nothing is appended to the output dataset name (as desired).

 

In the third macro call I have removed the "> 0" just for abbreviation, because it is redundant.

 

For debugging purposes you can use system option MPRINT to see the program code generated by a macro call in the log.

Frequent Contributor
Posts: 123

Re: Column totals

Posted in reply to FreelanceReinhard

Thank you, much appreciated.

Frequent Contributor
Posts: 123

Re: Column totals

Posted in reply to FreelanceReinhard

Okay. so I have a new variation of things, with slight differences:

proc summary data=EGTASK.FL_MULTIPLIER_SAVINGS_2015;
  where Business_Line = 'Commercial' and Channel = 'Retail' and CLIENT = 'FL' and NETWORK = 'FL INSTATE' and Specialty_Flag = 'Non-Specialty' and BRAND_GENERIC = 'Generic' and index(ApprovedPriceType,'MAC') > 0;
  id Business_Line Channel CLIENT NETWORK Specialty_Flag BRAND_GENERIC ApprovedPriceType;
  var ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX AWP_Rate ApprovedDispensingFeeAmount;
  output out=Total_Savings_FLv5_2015_3(drop=_:)
         sum(ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX)=ING_15 DF_15 RX_15
         mean(AWP_Rate ApprovedDispensingFeeAmount)=Avg_AWP_Rate_15 Avg_ADFA_15;
run;

proc summary data=EGTASK.FL_MULTIPLIER_SAVINGS_2015;
  where Business_Line = 'Commercial' and Channel = 'Retail' and CLIENT = 'FL' and NETWORK = 'FL INSTATE' and Specialty_Flag = 'Specialty' and BRAND_GENERIC = 'Generic' and index(ApprovedPriceType,'MAC') > 0;
  id Business_Line Channel CLIENT NETWORK Specialty_Flag BRAND_GENERIC ApprovedPriceType;
  var ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX AWP_Rate ApprovedDispensingFeeAmount;
  output out=Total_Savings_FLv5_2015_6(drop=_:)
         sum(ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX)=ING_15 DF_15 RX_15
         mean(AWP_Rate ApprovedDispensingFeeAmount)=Avg_AWP_Rate_15 Avg_ADFA_15;
run;

I've attempted to write a macro for these two and it looks like this (the main new change is that the output dataset name now has an "infix", i.e. "v5"), so I added a macro variable called infix:

 

%macro summ(yr=, crit=, infix=, suffix=);
proc summary data=EGTASK.FL_MULTIPLIER_SAVINGS_20&yr;
  where Business_Line = 'Commercial' and Channel = 'Retail' and CLIENT = 'FL' and NETWORK = 'FL INSTATE' and BRAND_GENERIC = 'Generic' and index(ApprovedPriceType,'MAC') > 0 and &crit;
  id Business_Line Channel CLIENT NETWORK Specialty_Flag BRAND_GENERIC ApprovedPriceType;
  var ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX AWP_Rate ApprovedDispensingFeeAmount;
  output out=Total_Savings_FL&infix_20&yr.&suffix(drop=_:)
         sum(ApprovedIngredientCost SUM_OF_ADFA SUM_OF_RX)=ING_&yr DF_&yr RX_&yr
         mean(AWP_Rate ApprovedDispensingFeeAmount)=Avg_AWP_Rate_&yr Avg_ADFA_&yr;
run;
%mend summ;

%summ(yr=15, crit=Specialty_Flag = 'Non-Specialty', infix=v5)
%summ(yr=15, crit=Specialty_Flag = 'Specialty', infix=v5, suffix=_2)

However, when I run it, I get an error on the &infix statement:

 

Business_Line Channel CLIENT NETWORK Specialty_Flag BRAND_GENERIC ApprovedPriceType;   var ApprovedIngredientCost
38       ! SUM_OF_ADFA SUM_OF_RX AWP_Rate ApprovedDispensingFeeAmount;   output out=Total_Savings_FL&infix_20&yr.&suffix(drop=_:)
                                                                                                    _
                                                                                                    22
38       !        sum
ERROR 22-322: Syntax error, expecting one of the following: ;, (, /, CSS, CV, IDGROUP, IDGRP, KURTOSIS, LCLM, MAX, MAXID, MEAN, 
              MEDIAN, MIN, MINID, MODE, N, NMISS, OUT, P1, P10, P20, P25, P30, P40, P5, P50, P60, P70, P75, P80, P90, P95, P99, 
              PROBT, Q1, Q3, QRANGE, RANGE, SKEWNESS, STDDEV, STDERR, SUM, SUMWGT, T, UCLM, USS, VAR.  

NOTE: Line generated by the invoked macro "SUMM".
38          Business_Line Channel CLIENT NETWORK Specialty_Flag BRAND_GENERIC ApprovedPriceType;   var ApprovedIngredientCost
38       ! SUM_OF_ADFA SUM_OF_RX AWP_Rate ApprovedDispensingFeeAmount;   output out=Total_Savings_FL&infix_20&yr.&suffix(drop=_:)
                                                                                                    _
                                                                                                    76
38       !        sum
ERROR 76-322: Syntax error, statement will be ignored.

My guess is that I'm missing something simple here.

 

Thanks,

JediApprentice

Trusted Advisor
Posts: 1,118

Re: Column totals

Posted in reply to JediApprentice

Yes, indeed: Just insert a period after &infix. This is to indicate the end of the macro variable reference. Otherwise, SAS will try to resolve a macro variable reference &infix_20. The name infix_20 is a valid macro variable name, but a macro variable of this name does not exist, hence the errors.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 524 views
  • 5 likes
  • 2 in conversation