BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PavelD
Obsidian | Level 7

Hello!

 

imagine a dataset and format like this:

 

data have;
input my_category $ value;
datalines;
A 1
A 0
A 3
B 7
B 3
B 1
;
run;

proc format ;
value $total_format
other = "T"
;
run;


I want to aggregate over my_category, using a custom format (i know, in this simple example I could simply remove my_category from CLASS, but I need to have the column my_category = T in the output dataset).

 

So I use proc summary: 

 

proc summary nway data = have;
class my_category;
var value;

format my_category $total_format.;

output out = have_summed sum = ;
run;


this produces: 

 

 

my_category _TYPE_ _FREQ_ value
T           1      6      15


This is expected, so far so good. However, my_category retains the format $total_format, instead of... hardcoding it.

 

That means that if you remove the format with the following:

 

data have2;
set have_summed;
format my_category;
run;


the result is:

 

 

my_category _TYPE_ _FREQ_ value
A           1      6      15


i.e. "A" instead of "T", which is simply wrong! 

 

 

In have_summed, the column my_category appears to have the value "T".

If you export have_summed to a CSV, the formatted value "T" is retained. But if you use for example the CATS/CATX function with my_category, it does not use "T" but "A", i.e. the unformatted value. I find this very confusing. What am I missing? Is there a better way? 

 

Platform SAS 9.4.7.0. 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

If you want PROC SUMMARY to convert formatted values to character using value-labels you can add the multilabel-format option MLF to the class statement;

 

data have;
input my_category $ value;
datalines;
A 1
A 0
A 3
B 7
B 3
B 1
;
run;

proc format ;
   value $total_format other="Total";
   quit;
proc summary nway data = have;
   class my_category / mlf;
   var value;
   format my_category $total_format.;
   output out=have_summed sum=;
   run;
proc contents varnum;
proc print;
   run;

Capture.PNG

View solution in original post

22 REPLIES 22
PaigeMiller
Diamond | Level 26

It's not clear what you are trying to do in the last step

 

the result is:

 

my_category _TYPE_ _FREQ_ value
A           1      6      15


i.e. "A" instead of "T", which is simply wrong! 

but this is correct. Formatting doesn't change the underlying value. So when you remove the format, the underlying value is returned. Your earlier run of PROC SUMMARY puts all the value of my_category into one category which appears as "T" and the underlying value (unformatted) is the value of "A" (which is what PROC SUMMARY does when it combines categories, it uses the first value alphabetically or numerically to represent combined categories).

 

So what are you trying to do here?

 

 

 

--
Paige Miller
PavelD
Obsidian | Level 7

Hello @PaigeMiller, thanks for your answer.

I find it hard to believe that when aggregating over values, proc summary takes the first value as a label. I cannot think of a situation where that can be useful and not misleading...   From your comment I understand that is the expected behaviour? 

When you look at the last dataset, you see that the total for category "A" is 15. That is wrong. 

I would not care as much and simply keep using the formatted value, except that in some procedures or functions SAS takes the formatted values, sometimes  the unformatted value.

 

What I want is really to force the proc summary to use the format only once in the aggregation, not to assign it permanently to the variable while keeping the unformatted values. 

Something like the put function in a data step:

new_var = put(other_var, $my_format.);

In this case new_var is not assigned permanently $my_format.

 

Maybe my whole approach is not SAS-y enough, maybe there is a better way. I just cant think of any simple way...

 

PaigeMiller
Diamond | Level 26

@PavelD wrote:

Hello @PaigeMiller, thanks for your answer.

I find it hard to believe that when aggregating over values, proc summary takes the first value as a label. I cannot think of a situation where that can be useful and not misleading...   From your comment I understand that is the expected behaviour? 


What should PROC SUMMARY call a combined group? Well, anyway, yes, it picks the first value alphabetically or numerically in the combined group.

 

When you look at the last dataset, you see that the total for category "A" is 15. That is wrong. 


Actually, your idea of assigning formats and removing them is causing SAS to say "A" has 15. SAS does what you tell it to do. So its not SAS that's wrong, it's your method that causes the wrong answer.

 


What I want is really to force the proc summary to use the format only once in the aggregation, not to assign it permanently to the variable while keeping the unformatted values. 


So, when it does this to do the aggregation, what value should it assign to my_category? I'm really not following what you want here.

 

In addition, let's move away from the actual code for a second and explain the analysis you are trying to do from a higher level, with no discussion of code. You have data, you want a certain result, what is that desired result (without mentioning code)? Show us the desired result.

--
Paige Miller
PavelD
Obsidian | Level 7

In addition, let's move away from the actual code for a second and explain the analysis you are trying to do from a higher level, with no discussion of code. You have data, you want a certain result, what is that desired result (without mentioning code)? Show us the desired result.


Good point, I should have started with this in the first place. I appreciate your patience, @PaigeMiller and @Kurt_Bremser and others. 

Maybe my choice of procs will turn out to be wrong completely.

So, I have raw data, let's say 10 columns, of which one is numeric (the VAR in proc summary). Of this raw data I need to create dozens of partially aggregated datasets, using predefined set of custom formats which I apply to 2-4 columns each time. I need to maintain the structure of the raw dataset, i.e. the 9+1 columns, so I put all 9 in the CLASS. My other idea was that I could have did some remerging after proc summary but I did not like that option, I always try to look for the simplest option. So now I have dozens of aggregated datasets, since they all have the same structure, I wanted to SET them together and create a single dataset, but that failed completely because each dimension adopted the zombie format from the first SET dataset, and as I wrote above the formats differed in each aggregation. Ultimately I wanted to export it into CSV, so I simply exported dozens of CSVs instead of one. This worked. What I meant was that in my situation, it would simply be easier to hardcode formats after each aggregation, perhaps with VVALUE as another member suggested, I thought about that, but I was just hoping there is a neater, cleaner option. If you have any idea or general remarks, I am all ears. Thanks!!

PaigeMiller
Diamond | Level 26

I'm sure that PROC SUMMARY could work here, but I'm still not clear what you want (for example, I don't understand what you mean by partially aggregated data sets) and again I request that you SHOW ME the desired output.

--
Paige Miller
PavelD
Obsidian | Level 7
Thanks @PaigeMiller, I am sorry I cannot provide more details, nor the desired result, I think it would be a lots of tables and I dont think it would bring much value. Thanks for your help!!
PaigeMiller
Diamond | Level 26

@PavelD wrote:
Thanks @PaigeMiller, I am sorry I cannot provide more details, nor the desired result, I think it would be a lots of tables and I dont think it would bring much value. Thanks for your help!!

You could show us the desired output for your example that your provided. But you haven't done that.

--
Paige Miller
data_null__
Jade | Level 19

@PavelD I'm not able to follow your description of the ultimate goal. My fault not yours.  If you would start a new tread using the have/need scenario I expect you to get many excellent answers that may or may not involve PROC SUMMARY.

 

 


@PavelD wrote:

In addition, let's move away from the actual code for a second and explain the analysis you are trying to do from a higher level, with no discussion of code. You have data, you want a certain result, what is that desired result (without mentioning code)? Show us the desired result.


Good point, I should have started with this in the first place. I appreciate your patience, @PaigeMiller and @Kurt_Bremser and others. 

Maybe my choice of procs will turn out to be wrong completely.

So, I have raw data, let's say 10 columns, of which one is numeric (the VAR in proc summary). Of this raw data I need to create dozens of partially aggregated datasets, using predefined set of custom formats which I apply to 2-4 columns each time. I need to maintain the structure of the raw dataset, i.e. the 9+1 columns, so I put all 9 in the CLASS. My other idea was that I could have did some remerging after proc summary but I did not like that option, I always try to look for the simplest option. So now I have dozens of aggregated datasets, since they all have the same structure, I wanted to SET them together and create a single dataset, but that failed completely because each dimension adopted the zombie format from the first SET dataset, and as I wrote above the formats differed in each aggregation. Ultimately I wanted to export it into CSV, so I simply exported dozens of CSVs instead of one. This worked. What I meant was that in my situation, it would simply be easier to hardcode formats after each aggregation, perhaps with VVALUE as another member suggested, I thought about that, but I was just hoping there is a neater, cleaner option. If you have any idea or general remarks, I am all ears. Thanks!!


 

PavelD
Obsidian | Level 7
Both @data_null__ and @PaigeMiller have hundreds of replies on your account, since neither of you understood what I wanted to say, I think we all agree the problem is not on your side, but on mine. You are right, starting a new thread would be a better idea at this point! Thanks!
PaigeMiller
Diamond | Level 26

Great idea, @PavelD 

 

Please focus on where you want to go with this analysis rather than on specific code. Examples with input data and output results are extremely helpful. I strongly suspect that by starting the discussion with your formatting problem, that we had entered what is called the XY problem.

--
Paige Miller
Kurt_Bremser
Super User

SAS retains the raw value because the variable will also retain its type, so if your category was numeric, but formatted to character values, the formatted values would not be possible:

data have;
input my_category value;
datalines;
1 1
1 0
1 3
2 7
2 3
2 1
;

proc format ;
value total_format
other = "T"
;
run;

proc means nway data=have sum;
class my_category;
var value;
format my_category total_format.;
output out=want sum=;
run;

my_category in WANT is numeric, so the format is needed, and the lowest numeric value for the group is kept.

data_null__
Jade | Level 19

If you want PROC SUMMARY to convert formatted values to character using value-labels you can add the multilabel-format option MLF to the class statement;

 

data have;
input my_category $ value;
datalines;
A 1
A 0
A 3
B 7
B 3
B 1
;
run;

proc format ;
   value $total_format other="Total";
   quit;
proc summary nway data = have;
   class my_category / mlf;
   var value;
   format my_category $total_format.;
   output out=have_summed sum=;
   run;
proc contents varnum;
proc print;
   run;

Capture.PNG

PavelD
Obsidian | Level 7
Very weird trick, but it indeed works 😄 Thank you

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 1368 views
  • 17 likes
  • 7 in conversation