Hello, I'm doing reports in french language using proc tabulate. I use ordrer=formatted to display the full name of the categorical class variable in alphabetic order. I use dynamic formats but for simple example lets consider a format like :
value $accent
"EE1" = "Étudiant"
"EE2" = "Etudiant"
"TTT" = "Total";
Because of the accent character 'É' being after other normal letters in ASCII, the order=formatted display
Etudiant
Total
Étudiant
I can't use order=unformatted and fit the values of the variable in the good order for different reasons, the program is being more complex than this example and in production. The best solution I found is to avoid using accent characters in first position of new category names but I don't like it. I wonder if there is a way to display 'É' aside 'E' and not after "T". I will try to put unicode accent like this :
EE1 = "E{unicode '0301'x}tudiant"
But I am off working and back monday only to try... and I still don't really like it even if it works ahah! Anyways, simplest would be to find a way to tell SAS that I'm using accent characters in my alphabet (french) and I would like to keep the proper alphabetic order, it could save troubles in the future. The option locale fr_CA doesn't work for this.
Thanks if someone know how to solve that problem and have a great day, success in your projects!
Here is some code to test.
data AccentData;
infile datalines dlm=',' dsd;
input Type $ fee;
datalines;
AAA,40
BBB,35
CCC,38
,38
;
run;
proc format;
value $ctype
"AAA" = "Économies"
"BBB" = "Salaires"
"CCC" = "Investissements"
other = "Invéterminé";
run;
proc tabulate data=AccentData missing;
class Type / order=formatted;
var fee;
table type,fee;
format type $ctype.;
run;
The best way is to add the decoded value into the dataset and pre-sort it before the report.
That way you can use the SORTSEQ= option to control the sorting. Then in the reporting step use ORDER=DATA to preserve the order produced by PROC SORT.
data lookup ;
  infile cards dsd dlm='=' truncover ;
  input code $ decode :$20. ;
cards;
"EE1" = "Étudiant"
"EE2" = "Etudiant"
"TTT"= "Total"
;
proc sort data=lookup;
 by decode;
run;
proc print;
run;
proc sort data=lookup sortseq=linguistic;
   by decode;
run;
proc print;
run;Results
Some actual example data and Proc Tabulate code to play with would be helpful.
With a limited format, such as you show, I would experiment with options like NOTSORTED in the format and PRELOADFMT and Order=Data in the Proc tabulate code.
Another possibility could be a sort and order =data for that class variable but that can depend on just where in a table the variable appears.
This example appears to do what you want for a very limited format and data. It adds a variable to the SASHELP.CLASS set to use your format with in a random order.
Proc format;
value $accent  (notsorted)
"EE1" = "Étudiant"
"EE2" = "Etudiant"
"TTT" = "Total"
;
run;
data trial;
   set sashelp.class;
   select (rand('Table',.3,.3,.4));
      when (1) text='EE1';
      when (2) text='EE2';
      when (3) text='TTT';
   end;
run;
Proc tabulate data=trial;
   class text /preloadfmt ;
   format text $accent.;
   class sex age;
   table  text ,
          sex age
        
         /printmiss
   ;
   table sex,
         text
         /printmiss
   ;
run;
If you are using this with a multilabel format then real examples are needed as the appearance with those is VERY touchy about defined order and usage.
Thanks for the suggestion, I will get back with more elaborated code sample... variable is type of fees and it can take hundreds of values, but only few of them in each levels of another class variable. but your idea sound a nice solution to me, I will give you news about it !!!
@ballardw wrote:options like NOTSORTED in the format and PRELOADFMT and Order=Data in the Proc tabulate code
"Hundreds of values" may mean that a Preloadfmt option isn't desired unless your typical report already includes all, or almost, of the values.
Unless you mean you are having many values map to one value so the table is showing many fewer output groups.
Exactly, we need the new report to show same order as old ones, and the order was formatted. We realized that a new type of fee has been named with an É as first letter. we want to correct only those entries and replace them in order in next reports... Also, the tables are used in many other reports so we need to do as little modification as possible in the code, that is why just replace É with E would do it, but it is not good orthograph for formal reports...
Changing that in the format step of the code i added in first post show the problem more clearly :
proc format;
value $ctype
"AAA" = "E(*ESC*){unicode '0301'x}n"
"BBB" = "En"
"CCC" = "Invé"
other = "Inve";
run;
The rule says e (E) comes before é (É), it display fine for Invé that comes after Inve. But if we force with unicode accent, order is not really correct because Én comes before En, and it is not good order according to writing rules...
Sorry for the editing, I was lost myself 😄 but problem persists!
@marc11235 wrote:
Changing that in the format step of the code i added in first post show the problem more clearly :
proc format;
value $ctype
"AAA" = "E(*ESC*){unicode '0301'x}n"
"BBB" = "En"
"CCC" = "Invé"
other = "Inve";
run;
The rule says e (E) comes before é (É), it display fine for Invé that comes after Inve. But if we force with unicode accent, order is not really correct because Én comes before En, and it is not good order according to writing rules...
Sorry for the editing, I was lost myself 😄 but problem persists!
Data.
Maybe a fuller definition of the actual format(s) you want to use.
Proc Tabulate code.
I made one dummy small example that I believe shows the desired behavior for the format but cannot test what I can't see.
Thank you ballardw for the time spent helping me! You are right, to be able to give appropriated solution (and optimize testing scenarios) you need to know which are the constraints of the actual format I use. It will take me some time to answer with precision, the format creation is a bit in a black box in the program, but I won't give up!
In this simple exemple it works fine if there are no categories starting with A B C D. But it needs to be more robust...
In reality, the format is dynamic, created from another SAS table being udpate automatically. In that table, order is made by variable value (AAA, BBB, ... as exemple), labels are not necessarily in order ('AAA'='Taxe' BBB='Étudiant'). If I just patch that table, it won't prevent for any further problems caused by new categories.
We see that for accent characters not in first position, alphabetical sort is working fine by default. Problem is when first character is an accent character, and it might be a tiny limit of using SAS with french, it would be very rare that 2 categories would be exactly same letters but first with accent character or not. It could happens in names lists in which accents are very important but I never noticed same problem with proc report.
@Ksharp wrote:
Could you write some data step code to dynamically pad the blanks before format's LABEL for the dataset you are used to generated format ? after that using "PROC FORM CNTLIN=dataset " to make the right format ?
That was the plan, scanning the 'mother' table and changing 'É' to 'E' if in first position, it would be acceptable for the program results. But lets say categories are persons names, Émmy and Emmy would be problematic, and it would need to be shown in good order among the others.
I'm not sure using blank spots would fix it, it would always bring all the E É in good order but in top of all categories. Removing them after creating the format doesn't fix the problem when using order=formatted in proc tabulate.
The best way is to add the decoded value into the dataset and pre-sort it before the report.
That way you can use the SORTSEQ= option to control the sorting. Then in the reporting step use ORDER=DATA to preserve the order produced by PROC SORT.
data lookup ;
  infile cards dsd dlm='=' truncover ;
  input code $ decode :$20. ;
cards;
"EE1" = "Étudiant"
"EE2" = "Etudiant"
"TTT"= "Total"
;
proc sort data=lookup;
 by decode;
run;
proc print;
run;
proc sort data=lookup sortseq=linguistic;
   by decode;
run;
proc print;
run;Results
Thank you Tom! I will be able to fit that small change in the program, and we shouldn't have any similar problems in the future! Thanks again!!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
