I have the following program and I'd like to try to use macros to simplify the repetative code. As you can see it runs the same steps for similarly named and structured datasets. The only difference is Quarter (Quarter 171 and Quarter 164). The datasets are read in and then merged.
Would there be a nice way to use macros to read in any number of Quarter specific datasets, say 163, 164, and 171, without having to hardcode the Quarter throughout?
Any help would be greatly appreciated!
/*Permanent storage location of quarterly summary datasets*/
libname DSs 'K:\TTMS\IV_Internal SPM\Parallel Testing\Parallel Test Harness\PTH_Quarterly_DSs';
data querydatasummary171;
set DSs.querydatasummary171;
if imb_dlvry_zip_5='171';
rename total=total_171 percent=percent_171;
drop rule_order score_impacting imb_dlvry_zip_5;
run;
/*Sort for merge*/
proc sort;
by rule_nm;
run;
data querydatasummary164;
set DSs.querydatasummary164;
if imb_dlvry_zip_5='164';
rename total=total_164 percent=percent_164;
drop rule_order score_impacting imb_dlvry_zip_5;
run;
/*Sort for merge*/
proc sort;
by rule_nm;
run;
/*Merge the datasets for comparison*/
data combined;
merge querydatasummary171 querydatasummary164;
by rule_nm;
run;
/*Reorder columns*/
data Quarterly_Summary_Comparisons;
retain rule_nm total_164 total_171 percent_164 percent_171;
set combined;
format total_164 comma12.0 total_171 comma12.0;
run;
/*For appearance*/
proc sort;
by descending total_171;
run;
/*Print report*/
proc print;
run;
I'm suspecting something similar ... since the variable names are TOTAL, I'm suspecting there is just one observation per combination of rul_nm and zip_5. If that's the case, here's a sample program:
proc tabulate data=have;
where imb_dlvry_zip_5 in ('164', '171');
class imb_dlvry_zip_5 rule_nm;
var total percent;
tables rule_nm, sum=' ' * imb_dlvry_zip_5 * (total percent);
run;
If there are actually multiple observations per rule_nm / zip_5, you may need to switch over to proc report instead of proc tabulate.
Check next macro and execution:
%macro prep_qtr(qtr);
   data querydatasummary&qtr.;
     set DSs.querydatasummary&qtr.;
           if imb_dlvry_zip_5="&qtr";
           rename total=total_&qtr.  percent=percent_&qtr.;
          drop rule_order score_impacting imb_dlvry_zip_5;
   run;
   /*Sort for merge*/
   proc sort;
      by rule_nm;
   run;
%mend prep_qtr;
%prep_qtr(171);
%prep_qtr(164);
Awesome! Thanks so much!
Is there anyway to simplify the Merge Statement as well using this macro logic? The problem is that there might be two Quarters (171, 164) or three Quarters (171, 164, 163), or even more quarters.
Am I stuck having to hard code the Merge Statement?
data combined;
merge querydatasummary171 querydatasummary164;
by rule_nm;
run;Here's a suggestion: provide some example data and what the result (the proc print) should look like.
I suspect the final result, proc print like output or very similar could be accomplished with one datastep and a report procedure. And would be extensible to more than two data sets.
And considering that your data sets have names like querydatasummary171, might even be possible without creating those individual summaries.
I'm suspecting something similar ... since the variable names are TOTAL, I'm suspecting there is just one observation per combination of rul_nm and zip_5. If that's the case, here's a sample program:
proc tabulate data=have;
where imb_dlvry_zip_5 in ('164', '171');
class imb_dlvry_zip_5 rule_nm;
var total percent;
tables rule_nm, sum=' ' * imb_dlvry_zip_5 * (total percent);
run;
If there are actually multiple observations per rule_nm / zip_5, you may need to switch over to proc report instead of proc tabulate.
Yes, in this case, you might have to begin with:
data have;
set DSs.querydatasummary171 DSs.querydatasummary164;
run;
It further assumes that each summary data set contains only data for that particular imb_dlvry_zip_5. If there are mixes and matches (for example, if DSs.querydatasummary164 could contain observations where imb_dlvry_zip_5 is actually "171"), there will be problems with this.
It might be possible to modify the process that creates the individual querydatasummary data sets, so you get all the data in one place. I don't know if you have control over that.
Your suggestion is genius. Thanks so much. I've attached the sample output using the Proc Tabulate as you suggested. 🙂
My only remaining issues would be how to do the following (if possible):
1. Order the resulting table by by Total, so that the largest Total appears at the top of the chart?
2. Format the Total column so that it has Commas, and so it does not have the trailing decimal values
3. Format the Percent column so that is displays as a true percentage (e.g. Volume Matching at the bottom of the attached output should display as 94.01% and 91.07% respectively)
I'm doing something wrong trying to get my formats included:
proc tabulate data=combined;
class imb_dlvry_zip_5 rule_nm;
var total percent;
tables rule_nm, sum=' ' * imb_dlvry_zip_5 * (total percent);
format total=comma12.0 percent=percent8.2;
run;
MPRINT(LOOPY):   proc tabulate data=combined;
NOTE: Line generated by the invoked macro "LOOPY".
5      class imb_dlvry_zip_5 rule_nm; var total percent; tables rule_nm, sum=' ' * imb_dlvry_zip_5 * (total percent); format
5   ! total=comma12.0 percent=percent8.2; run;
           -
           22
           200
MPRINT(LOOPY):   class imb_dlvry_zip_5 rule_nm;
MPRINT(LOOPY):   var total percent;
MPRINT(LOOPY):   tables rule_nm, sum=' ' * imb_dlvry_zip_5 * (total percent);
MPRINT(LOOPY):   format total=comma12.0 percent=percent8.2 run;
ERROR 22-322: Syntax error, expecting one of the following: a name, a format name, ;, -, :, _ALL_, _CHARACTER_, _CHAR_,
              _NUMERIC_.
ERROR 200-322: The symbol is not recognized and will be ignored.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE TABULATE used (Total process time):
      real time           0.00 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      memory              608.70k
      OS Memory           14728.00k
      Timestamp           02/07/2017 01:10:43 PMYou're moving in the right direction, but the syntax needs to be tweaked a little. TABULATE uses *f= to apply a format. Untested, but this would be my first attempt:
proc tabulate data=have;
where imb_dlvry_zip_5 in ('164', '171');
class imb_dlvry_zip_5 rule_nm;
var total percent;
tables rule_nm, sum=' ' * imb_dlvry_zip_5 * (total*f=comma10.0 percent*f=percent8.1);
run;
PROC TABULATE does support a few values for ORDER=, but I'm not sure they would help. After all, it's entirely possible that "164" would have a different maximum value than "171".
If your data is summarized then you could use
proc sort data=have;
by descending total;
run;
and
Proc tabulate data= have order=data;
When you have multiple columns with the variable "total", such as for each level of the imb_dlvry_sip_5, then the left most column may not have the expected order.
2 and 3 can be accomplished with
tables rule_nm, imb_dlvry_zip_5 * (total*sum=' ' *f=comma12. percent*sum=' ' *f=percent9.2);
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
