DATA Step, Macro, Functions and more

proc freq on chararter variable

Accepted Solution Solved
Reply
Contributor
Posts: 62
Accepted Solution

proc freq on chararter variable

Hi all, 

I ran a proc freq on a character variable  "country_name" and I got  frequencies  in alphabatical order. 

I would have been satisfy with this result, but I need to have "other" as the last category in my output.

Thank you, 

 

Here is an abbreviated  list
Afghanistan
Albania
Bahrain
Barbados
Belarus

Cambodia

Denmark
Dominican Republic
Ecuador
Egypt

Fiji
Finland

Oman

Other 

.....

Zambia

 


Accepted Solutions
Solution
‎04-13-2016 11:15 PM
Trusted Advisor
Posts: 1,115

Re: proc freq on chararter variable

If you want to stick to your existing character variable, you could apply the ORDER= option of the PROC FREQ statement:

 

  1. One option is to use ORDER=DATA in conjunction with a sorted input dataset where observations with COUNTRY_NAME='Other' are sorted last. Example:
    proc sql;
    create view temp as
    select * from have
    order by country_name='Other', country_name;
    quit;
    
    proc freq data=temp order=data;
    tables country_name;
    run;
  2. Another option is to define a format for COUNTRY_NAME which sorts "Other" last and then use ORDER=FORMATTED. Unfortunately, PROC FREQ does not support the PRELOADFMT option that is available in PROC MEANS and PROC TABULATE. (With this you can specify the order of format categories in the PROC FORMAT statement.) So, to apply ORDER=FORMATTED, "Other" must be made the alphabetically last formatted value. Most easily, you could use the lower-case label "other". Otherwise, you could achieve the desired sort order by indenting all country names by a blank and "Other" by 'A0'x (non-breakable space). To avoid the indentation, you could try to prefix "Other" with an invisible character (with an ASCII code >90 [dec]), but this is platform and device dependent. On my Windows workstation the below format works with listing output (not HTML, ...):
    proc format;
    value $country
    'Other'='81'x 'Other' '01'x
    other=[$40.];
    run;
    
    proc freq data=have order=formatted;
    format country_name $country.;
    tables country_name;
    run;
    (The '01'x compensates the "outdentation" caused by the invisible '81'x character.) I would not recommend this for production purposes.

View solution in original post


All Replies
Super User
Posts: 17,899

Re: proc freq on chararter variable

I would capture the output to a table, sort as desired and then print the results. I feel like there should be a shorter way, but can't think of it at the moment. 

 

proc freq data=have;
table countries/out=country_freq;
run;

data country_freq;
set country_freq;
if country='Other' then order=9999999;
else order=_n_;
run;

proc sort data=country_freq;
by order;
run;

proc print data=country_freq;
run;
Contributor
Posts: 62

Re: proc freq on chararter variable

Thank you for your quick reply  Reeza.  I will give it a try.

 

Contributor
Posts: 62

Re: proc freq on chararter variable

@Reeza thank you!

Trusted Advisor
Posts: 1,115

Re: proc freq on chararter variable

Hi @archibald,

 

Wouldn't it be acceptable to write "other" in lower case? This should move it to the end of the alphabetical list (provided that all country names start with capital letters).

Contributor
Posts: 62

Re: proc freq on chararter variable

As a matter fact, this is pretty straigthfoward and interesting way of solving the issue, thanks! but the only problem is to format  "other' afterward so that it is in capital letter. 

Contributor
Posts: 62

Re: proc freq on chararter variable

@FreelanceReinhard, As a matter fact, this is pretty straigthfoward and interesting way of solving the issue, thanks! but the only problem is to format  "other' afterward so that it is in capital letter. 

Trusted Advisor
Posts: 1,115

Re: proc freq on chararter variable


archibald wrote:

... but the only problem is to format  "other' afterward so that it is in capital letter. 


But this is an easy "problem":

proc format;
value $country (default=40)
'other'='Other';
run;

proc freq data=have;
format country_name $country.;
tables country_name;
run;

(The option "default=40" specifies the length of the format. You could adapt the value 40 to the length of the longest country name.)

 

 

Trusted Advisor
Posts: 1,115

Re: proc freq on chararter variable

I think, the cleanest way to control the order is to define a variable, say, COUNTRY which does not contain the country name, but a numeric or character code. If this variable replaced COUNTRY_NAME, you would even save disk space, because the code could be very short (possibly only 1 byte, definitely not more than 3!). Make sure that the code for "Other" is sorted last.

 

Then you would define a format assigning country names (and "Other") to the codes and assign it to variable COUNTRY. By default, the internal values are used to sort PROC FREQ output. So, with tables country; you can get an output showing the country names, sorted by the code you defined, hence with "Other" sorted last.

Solution
‎04-13-2016 11:15 PM
Trusted Advisor
Posts: 1,115

Re: proc freq on chararter variable

If you want to stick to your existing character variable, you could apply the ORDER= option of the PROC FREQ statement:

 

  1. One option is to use ORDER=DATA in conjunction with a sorted input dataset where observations with COUNTRY_NAME='Other' are sorted last. Example:
    proc sql;
    create view temp as
    select * from have
    order by country_name='Other', country_name;
    quit;
    
    proc freq data=temp order=data;
    tables country_name;
    run;
  2. Another option is to define a format for COUNTRY_NAME which sorts "Other" last and then use ORDER=FORMATTED. Unfortunately, PROC FREQ does not support the PRELOADFMT option that is available in PROC MEANS and PROC TABULATE. (With this you can specify the order of format categories in the PROC FORMAT statement.) So, to apply ORDER=FORMATTED, "Other" must be made the alphabetically last formatted value. Most easily, you could use the lower-case label "other". Otherwise, you could achieve the desired sort order by indenting all country names by a blank and "Other" by 'A0'x (non-breakable space). To avoid the indentation, you could try to prefix "Other" with an invisible character (with an ASCII code >90 [dec]), but this is platform and device dependent. On my Windows workstation the below format works with listing output (not HTML, ...):
    proc format;
    value $country
    'Other'='81'x 'Other' '01'x
    other=[$40.];
    run;
    
    proc freq data=have order=formatted;
    format country_name $country.;
    tables country_name;
    run;
    (The '01'x compensates the "outdentation" caused by the invisible '81'x character.) I would not recommend this for production purposes.
Contributor
Posts: 62

Re: proc freq on chararter variable

@FreelanceReinhard the first option is perfect! thanks so much!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 411 views
  • 0 likes
  • 3 in conversation