I am using proc report with the `out=` option and I am seeing some strange behaviour using SAS Enterprise Guide version 7.15 HF8 (7.100.5.6214) (32-bit).
I have measurements of several metrics (CAT2) as well as change in measurements for the same metrics (CAT1). My intent is to create a table with summary statistics for each metric. To ensure that the metrics are listed in the same order for each category, and to keep the report clean, I created a format that standardizes the param names between the two categories. In theory, proc report should order by and print the formatted value, but the values of param in the output dataset should be the unformatted values.
What I am seeing instead, is some strange mix of param values from both categories as shown in the table below. Here is a reproducible version of the issue with invented data and the output table `unexpected` for reference.
CAT | PARAM | N | MEAN | _BREAK_ |
CAT1 | Anxiety and Fears | 2 | 2 | |
CAT1 | Change in Autism | 2 | 0 | |
CAT1 | Change in Hyperactivity and Inattention | 2 | 2 | |
CAT2 | Anxiety and Fears | 2 | 5 | |
CAT2 | Change in Autism | 2 | 6 | |
CAT2 | Change in Hyperactivity and Inattention | 2 | 4 |
proc format;
value $ my_format
"Change in Anxiety and Fears" = "Anxiety and Fears"
"Change in Autism" = "Global Severity of Autism"
"Change in Hyperactivity and Inattention" = "Hyperactivity and Inattention";
run;
data indata;
length cat $4 param $50;
input cat $ param $ val;
infile datalines delimiter=',';
datalines;
CAT1, Change in Anxiety and Fears, 1
CAT1, Change in Anxiety and Fears, 3
CAT1, Change in Autism, -1
CAT1, Change in Autism, 1
CAT1, Change in Hyperactivity and Inattention, 2
CAT1, Change in Hyperactivity and Inattention, 2
CAT2, Anxiety and Fears, 5
CAT2, Anxiety and Fears, 5
CAT2, Global Severity of Autism, 7
CAT2, Global Severity of Autism, 5
CAT2, Hyperactivity and Inattention, 3
CAT2, Hyperactivity and Inattention, 5
run;
proc report data=indata out=unexpected;
column cat param n val=mean;
define cat / group;
define param / group format=$my_format.;
define mean / analysis mean;
run;
data expected;
length cat $4 param $50;
input cat $ param $ n sum _break_;
infile datalines delimiter=',' dsd;
datalines;
CAT1, Change in Anxiety and Fears, 2, 2,
CAT1, Change in Autism, 2, 0,
CAT1, Change in Hyperactivity and Inattention, 2, 2,
CAT2, Anxiety and Fears, 2, 5,
CAT2, Global Severity of Autism, 2, 6,
CAT2, Hyperactivity and Inattention, 2, 4,
;
run;
Can anybody explain this strange behaviour?
Is it just that you printed UNEXPECTED without using the display format?
Why not just use a FORMAT statement to attach the format instead of an option on the DEFINE statement?
proc report data=indata out=unexpected;
column cat param n val=mean;
define cat / group;
define param / group ;
define mean / analysis mean;
format param $my_format.;
run;
proc print data=unexpected;
run;
ods select position;
proc contents data=unexpected varnum;
run;
SAS 9.4 on WINDOWS 13:52 Thursday, April 21, 2022 10 cat param n val CAT1 Anxiety and Fears 2 2 Global Severity of Autism 2 0 Hyperactivity and Inattention 2 2 CAT2 Anxiety and Fears 2 5 Global Severity of Autism 2 6 Hyperactivity and Inattention 2 4 SAS 9.4 on WINDOWS 13:52 Thursday, April 21, 2022 11 Obs cat param n mean _BREAK_ 1 CAT1 Anxiety and Fears 2 2 2 CAT1 Global Severity of Autism 2 0 3 CAT1 Hyperactivity and Inattention 2 2 4 CAT2 Anxiety and Fears 2 5 5 CAT2 Global Severity of Autism 2 6 6 CAT2 Hyperactivity and Inattention 2 4 SAS 9.4 on WINDOWS 13:52 Thursday, April 21, 2022 12 The CONTENTS Procedure Variables in Creation Order # Variable Type Len Format Label 1 cat Char 4 cat 2 param Char 50 $MY_FORMAT. param 3 n Num 8 n 4 mean Num 8 val 5 _BREAK_ Char 32
Formats to not change the stored values, just how they are displayed. Most procedures that group the formatted still will return one of the actual values (I assume the first one seen for that grouped value).
If you want to change the value add another step to store the formatted value. Either into the same variable (since it is character) or into a new variable (required if it is numeric).
data want;
set unexpected;
param=put(param,$my_format.);
run;
PS Who uses an output dataset from PROC REPORT? If you want calculate statistics use procedures that are designed to do that.
When a formatted value does not appear correct it may be that the actual value of the variable is different than the value the format is looking for/expecting so displays the raw value unless an other= option is used in the format.
You need to double check, NOT by printing, that your data does not have any leading spaces in the value of the "Change in Hyperactivity". If the function length for that value in the data set returns something other than 39 that is likely (or some other non-printable character)
Note that proc print, freq and most of the procedures that display values will left justify text values and in effect hide the existence of leading characters. There is also a minor chance that there is a non-space non-printable character somewhere in the value or a UTF character that doesn't quite align with the font used to create the format which is likely ASCII characters.
Is it just that you printed UNEXPECTED without using the display format?
Why not just use a FORMAT statement to attach the format instead of an option on the DEFINE statement?
proc report data=indata out=unexpected;
column cat param n val=mean;
define cat / group;
define param / group ;
define mean / analysis mean;
format param $my_format.;
run;
proc print data=unexpected;
run;
ods select position;
proc contents data=unexpected varnum;
run;
SAS 9.4 on WINDOWS 13:52 Thursday, April 21, 2022 10 cat param n val CAT1 Anxiety and Fears 2 2 Global Severity of Autism 2 0 Hyperactivity and Inattention 2 2 CAT2 Anxiety and Fears 2 5 Global Severity of Autism 2 6 Hyperactivity and Inattention 2 4 SAS 9.4 on WINDOWS 13:52 Thursday, April 21, 2022 11 Obs cat param n mean _BREAK_ 1 CAT1 Anxiety and Fears 2 2 2 CAT1 Global Severity of Autism 2 0 3 CAT1 Hyperactivity and Inattention 2 2 4 CAT2 Anxiety and Fears 2 5 5 CAT2 Global Severity of Autism 2 6 6 CAT2 Hyperactivity and Inattention 2 4 SAS 9.4 on WINDOWS 13:52 Thursday, April 21, 2022 12 The CONTENTS Procedure Variables in Creation Order # Variable Type Len Format Label 1 cat Char 4 cat 2 param Char 50 $MY_FORMAT. param 3 n Num 8 n 4 mean Num 8 val 5 _BREAK_ Char 32
Formats to not change the stored values, just how they are displayed. Most procedures that group the formatted still will return one of the actual values (I assume the first one seen for that grouped value).
If you want to change the value add another step to store the formatted value. Either into the same variable (since it is character) or into a new variable (required if it is numeric).
data want;
set unexpected;
param=put(param,$my_format.);
run;
PS Who uses an output dataset from PROC REPORT? If you want calculate statistics use procedures that are designed to do that.
You have some good points there @Tom . As you say, a format does not change the stored value, therefore I expected to the `unexpected` dataset to have the original unformatted values, but if you look closely at the table I included in my first post, you can see that the first row is formatted, the second two are not, and the last three are repeats of the first three even those rows were not in the value statement on the left side of the equals at all. It seems to have to do with those last three rows that already have the formatted values because when I change those rows in the input data the issue goes away and I get the unformatted values as expected.
Anyway, your advice to use a format statement does seem to work and I'm sure there are other ways to accomplish my task that make more sense, but that strange output still puzzles me.
Your FORMAT does not include an OTHER category so any value that does not EXACTLY match the values being formatted will print as the raw value.
So if the format definition is:
proc format;
value $ my_format
"Change in Anxiety and Fears" = "Anxiety and Fears"
"Change in Autism" = "Global Severity of Autism"
"Change in Hyperactivity and Inattention" = "Hyperactivity and Inattention"
;
run;
And you are seeing UNFORMATTED values like:
Anxiety and Fears Change in Autism Change in Hyperactivity and Inattention
Then those are the values in the data.
Play around with some values to see what it happening. Note that leading spaces in the values makes a difference. And that ODS output (like HTML, PDF or RTF) does NOT do a good job of showing leading spaces. Adding quotes helps.
proc format;
value $ my_format
"Change in Anxiety and Fears" = "Anxiety and Fears"
"Change in Autism" = "Global Severity of Autism"
"Change in Hyperactivity and Inattention" = "Hyperactivity and Inattention"
;
run;
data have;
input string $char50. ;
cards;
Anxiety and Fears
Change in Autism
Hyperactivity and Inattention
Change in Anxiety and Fears
Change in Autism
Change in Hyperactivity and Inattention
other text
Change in Anxiety and Fears
Change in Autism
Change in Hyperactivity and Inattention
;
data test;
set have ;
raw = quote(trim(string));
formatted =quote(trim(put(string,$my_format.)));
run;
proc print;
run;
Obs string raw formatted 1 Anxiety and Fears "Anxiety and Fears" "Anxiety and Fears" 2 Change in Autism "Change in Autism" "Global Severity of Autism" 3 Hyperactivity and Inattention "Hyperactivity and Inattention" "Hyperactivity and Inattention" 4 Change in Anxiety and Fears "Change in Anxiety and Fears" "Anxiety and Fears" 5 Change in Autism "Change in Autism" "Global Severity of Autism" 6 Change in Hyperactivity and Inattention "Change in Hyperactivity and Inattention" "Hyperactivity and Inattention" 7 other text "other text" "other text" 8 Change in Anxiety and Fears " Change in Anxiety and Fears" " Change in Anxiety and Fears" 9 Change in Autism " Change in Autism" " Change in Autism" 10 Change in Hyperactivity and Inattention " Change in Hyperactivity and Inattention" " Change in Hyperactivity and"
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.