BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jagnew
Obsidian | Level 7

Hi,

 

I have a feeling the solution I'm hoping for doesn't exist/isn't possible, but here goes:

 

- I have a dataset with 21 variables, one of which is ACTIVITY_TYPE which has 5 possible values - 1/2/3/4/5

- I want to look at all of the other variables against ACTIVITY_TYPE - what are the proportions of their responses across ACTIVITY TYPE?

- EG; one of the variables is 'Organisation' - A/B/C/D. I want to see for each of A/B/C/D what % is ACTIVITY_TYPE 1/2/3/4/5

- I have used PROC FREQ to get such a one way frequency analysis (it is only the 'col pct' and 'frequency' that I am interested in)

- I know I can use PROC FREQ to output multiple tables showing the results, but I will then be left with manual process to bring them together in one table

 

- Is there any way to shorten this? I am using EG.

 

Any advice would be appreciated.

 

Thanks 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

There are a couple of possibilities to investigate.

 

First, PROC TABULATE automatically removes observations that have a missing value for any of the CLASS variables.  You would need to add the MISSING option to include those observations.  It's particularly devious, because if variable A has a missing value, that observation is removed even for computing the distribution of variable B.

 

A second possibility to investigate:  do any of the CLASS variables have formats attached to them?  PROC CONTENTS would reveal that.  Formats can group many values into a single row in the output table.

View solution in original post

8 REPLIES 8
Reeza
Super User

Use the ODS table and clean that up. 

 

ods output crosstabfreqs=summary;
proc freq data=sashelp.class;
table sex*(_all_);
run;

proc print data=summary;run;

And heres an attempt at cleaning it up:

 

ods output crosstabfreqs=summary;
proc freq data=sashelp.class;
table sex*(_all_);
run;



data long;
	length variable $32. variable_value $50.;
	set summary;
	Variable=scan(table, 2, '*');
	Variable_Value=strip(trim(vvaluex(variable)));
	presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
	keep sex variable  variable_value frequency percent presentation;
	label variable='Variable' variable_value='Variable Value';
run;
jagnew
Obsidian | Level 7
Thanks Reeza, I haven't been able to get my head around this yet but I will have a look soon
Astounding
PROC Star

You should be able to get that directly from the data.  Getting the right denominator for percentages can be tricky, so see if this fits what you are asking for:

 

proc tabulate data=have;

class activity_type organisation plus a few more;

tables organization plus a few more,

   activity_type * pctn <activity_type>;

run;

mkeintz
PROC Star

Here's a little amplification of @Astounding's suggestion, which produces both N and column frequencies.  PROC TABULATE is not entirely intuitive, but addresses your request for combined tables:

 

data have;
  input atype b $  c sex $ e;
datalines;
1 X 1 M 1
1 X 2 M 1
1 Y 3 M 5
2 Y 1 F 2
2 C 2 F 3
3 D 3 M 3
3 X 1 M 3
3 Y 2 F 1
3 Y 3 F 1
4 D 1 M 2
4 X 2 F 2
5 C 3 F 2
5 C 1 F 1
5 D 2 F 1
5 D 3 F 1
5 D 1 M 1
run;

proc tabulate data=have noseps;
  class atype b c sex e;
  tables
    (b='VarB' c='VarC' sex='Sex' e='Var E')
     *(N*f=8.0 pctn<atype all>='%'*f=8.1)
	,
    (atype='A Type Pcts' all)
  /rts=12 row=float;
run; 
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jagnew
Obsidian | Level 7
Hi @mkeintz, apologies if I'm being dumb but what exactly are you doing in the tables statement? Thanks
jagnew
Obsidian | Level 7

Thanks Astounding. I think this is the right method/solution as it is close to what I need. However I am having issues with some of the variables only showing 1 value in the output - while some variables correctly show all their values....

Astounding
PROC Star

There are a couple of possibilities to investigate.

 

First, PROC TABULATE automatically removes observations that have a missing value for any of the CLASS variables.  You would need to add the MISSING option to include those observations.  It's particularly devious, because if variable A has a missing value, that observation is removed even for computing the distribution of variable B.

 

A second possibility to investigate:  do any of the CLASS variables have formats attached to them?  PROC CONTENTS would reveal that.  Formats can group many values into a single row in the output table.

jagnew
Obsidian | Level 7
Adding the MISSING option worked! Thank you, this has been very helpful

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1741 views
  • 4 likes
  • 4 in conversation