BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LoriK
Fluorite | Level 6

In SAS 9.3

 

I'm creating a table in Proc Report and would like to display the values for a variable named "country" with all values (country names and their corresponding counts) displayed unless the country represents <20% of the total count.  My dataset is huge, so I've created a sample of the data for illustration.  

 

SAMPLE DATA

PersonID   country      count       

1               Australia     1    

2               USA            1   

3               Australia     1

4               Canada      1

5               Canada      1

6               Canada      1

5               Germany    1   

 

I created new variable "countrycount" to sum totals for each country.


proc sql /*noprint */;
create table finalsql as
select * , sum(count) as countrycount label='Country Count'
from final
group by country
quit;

 

to give me output:

PersonID   country      count    countrycount      

1               Australia     1          2

2               USA            1          1

3               Australia     1          2

4               Canada      1          3

5               Canada      1          3

6               Canada      1          3

5               Germany    1          1

 

I would like to show a table where Canada and Australia are displayed, and USA and Germany are collapsed into the 'other' category.  I'd like to display the total counts by country, like this:

 

TABLE

Country    Total

Canada    3

Australia   2

Other        2

 

I am trying to automate report generation, and the arrival data will be changing monthly, so I need a dynamic process to collapse countries into the "other" category.  Is there a way to create a format that would essentially work like this? My challenge is to create a format based on the values from two variables, one character and one numeric.  OR is there another way to do this? 

 

 /*I realize this wouldn't actually work in proc format - hope it helps explains my question*/  

proc format

      value      $country_collapsed

                     country = country                         /* do this for all values where countrycount >=20% of total */

                     country totals <20% = 'other';       /* collapse all countries into 'other' category if countrycount <20% total */

                                                             

run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

I think this is an example of what you might want. Basically get a summary of the data, created a cntlin dataset for proc format filtered on your value range, feed that to proc format and use. This overwrites the existing, if any format, and recreates it.

 

 

proc freq data=sashelp.cars noprint ;
tables cylinders / out=percent;
run;

data cylindercntlin;
set percent;
fmtname = 'Cylinders';      
start = cylinders; end=cylinders;
label = 'Other';
Type='N';  /* for character this would be C*/
IF .< percent < 20; /* so only values with the percent less than 20 will be mapped*/
run;

proc format cntlin=cylindercntlin library=work;
run;

proc freq data=sashelp.cars ;
tables cylinders ;
format cylinders Cylinders.;
run;

View solution in original post

8 REPLIES 8
stevyfargose
Obsidian | Level 7

If proc format is not the only option you have than you can also try creating a view and use the view in your program

example:

 

proc sql;
        create view collapsed as
           select personid,country,count,
           case country
            when 'Canada' then 'Canada'
            when 'Australia' then 'Australia'
            else 'other' end
      from sample;

LoriK
Fluorite | Level 6

Thanks, Stevyfargose.  


I will try that - I am not tied to proc format, but I do need to figure out how to dynamically create a collapsed "other" category (I don't want to hard code anything because the country names will change from month to month, and so will the counts associated with them).

Cynthia_sas
SAS Super FREQ
Hi:
It sounds to me like something you would implement with a SAS Macro program, where the collapsing would happen inside a macro and the categories would be created.
cynthia
LoriK
Fluorite | Level 6

Thanks, Cynthia.  I'm going to try that and the CNTLIN options that you guys have suggested.  I appreciate your thoughts on how to do this!

ballardw
Super User

I think this is an example of what you might want. Basically get a summary of the data, created a cntlin dataset for proc format filtered on your value range, feed that to proc format and use. This overwrites the existing, if any format, and recreates it.

 

 

proc freq data=sashelp.cars noprint ;
tables cylinders / out=percent;
run;

data cylindercntlin;
set percent;
fmtname = 'Cylinders';      
start = cylinders; end=cylinders;
label = 'Other';
Type='N';  /* for character this would be C*/
IF .< percent < 20; /* so only values with the percent less than 20 will be mapped*/
run;

proc format cntlin=cylindercntlin library=work;
run;

proc freq data=sashelp.cars ;
tables cylinders ;
format cylinders Cylinders.;
run;
LoriK
Fluorite | Level 6

Thanks, ballardw.  I'm going to try this, as well.  If I can figure it out, I'll post again with my code to make it work.

LoriK
Fluorite | Level 6

I wanted to follow up.  I took the cntlin suggested by ballardw and with a little modification, it worked!  

 

The only thing I am still working through is how to force the "Other" and missing categories to sort to the last position while allowing everything else to sort alphabetically (and again, I can't hard code the formatting since the countries will change, more will be added, etc.).  I may end up having to manually do that in the end report - I can't figure out an order that will do the trick.

 

Thank you all for your help and suggestions.  Here is my final code:

 


proc freq data=final noprint;
tables country_of_birth / out=r.percent ;
run;

 

data countryofbirthcntlin;
set r.percent;
fmtname= 'COB';
start=country_of_birth; end=country_of_birth;
Type='C';                                                /*this would be N for numeric*/
IF 0< percent < 1 then label='Other* ' ;  /*so only values with percent less than 20 will be mapped to the other category*/
ELSE IF percent >=1 then label= country_of_birth;
run;

 

proc format cntlin=countryofbirthcntlin library=work;
run;

 

proc report data=countryofbirthcntlin;

column country_of_birth count;
define country_of_birth /  group format=$COB.;
define count / 'Number';

rbreak after / summarize page ;

compute after / style={just=l};

endcomp;

run;

ballardw
Super User

If you look at the Proc Format documentation you will see that there are a number of options that we have not addressed in this topic. One thing is to code a value format with different options and see if that helps. Then use the CNTLOUT option on proc format to see what the values set in the format used were so you can incorporate those elements into your format.

 

Also note that several procs have ORDER = directives that may interact with formats. For instance the DEFINE block supporst ORDER = with options of Data, Formatted, Freq and Internal. Without a specific instruction the default is Formatted, which will generally be the alphabetic order of your country format. So you would Lithuania, Other, Poland.

 

For Proc report you could either insure that the order of the data read into the procedure is the order you want and use the Order=Data option or go another step to create a variable that would be in the sequence you want, probably a numeric, and assign that sequence variable an equivalent and use the Order=internal option.

 

The steps above to get the format could also get you an informat, Invalue in proc format, to transform the name to a sequence variable.

 

 

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 3048 views
  • 3 likes
  • 4 in conversation