Solved: Re: Collating Proc Freq Table Values

jagnew · Posted 02-28-2018 11:50 AM

Hi,

I have a feeling the solution I'm hoping for doesn't exist/isn't possible, but here goes:

- I have a dataset with 21 variables, one of which is ACTIVITY_TYPE which has 5 possible values - 1/2/3/4/5

- I want to look at all of the other variables against ACTIVITY_TYPE - what are the proportions of their responses across ACTIVITY TYPE?

- EG; one of the variables is 'Organisation' - A/B/C/D. I want to see for each of A/B/C/D what % is ACTIVITY_TYPE 1/2/3/4/5

- I have used PROC FREQ to get such a one way frequency analysis (it is only the 'col pct' and 'frequency' that I am interested in)

- I know I can use PROC FREQ to output multiple tables showing the results, but I will then be left with manual process to bring them together in one table

- Is there any way to shorten this? I am using EG.

Any advice would be appreciated.

Thanks

Astounding · Posted 03-01-2018 09:21 AM

There are a couple of possibilities to investigate.

First, PROC TABULATE automatically removes observations that have a missing value for any of the CLASS variables. You would need to add the MISSING option to include those observations. It's particularly devious, because if variable A has a missing value, that observation is removed even for computing the distribution of variable B.

A second possibility to investigate: do any of the CLASS variables have formats attached to them? PROC CONTENTS would reveal that. Formats can group many values into a single row in the output table.

View solution in original post

Reeza · Posted 02-28-2018 11:56 AM

Use the ODS table and clean that up.

ods output crosstabfreqs=summary;
proc freq data=sashelp.class;
table sex*(_all_);
run;

proc print data=summary;run;

And heres an attempt at cleaning it up:

ods output crosstabfreqs=summary;
proc freq data=sashelp.class;
table sex*(_all_);
run;



data long;
	length variable $32. variable_value $50.;
	set summary;
	Variable=scan(table, 2, '*');
	Variable_Value=strip(trim(vvaluex(variable)));
	presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
	keep sex variable  variable_value frequency percent presentation;
	label variable='Variable' variable_value='Variable Value';
run;

jagnew · Posted 03-01-2018 09:58 AM

Thanks Reeza, I haven't been able to get my head around this yet but I will have a look soon

Astounding · Posted 02-28-2018 12:04 PM

You should be able to get that directly from the data. Getting the right denominator for percentages can be tricky, so see if this fits what you are asking for:

proc tabulate data=have;

class activity_type organisation plus a few more;

tables organization plus a few more,

activity_type * pctn <activity_type>;

run;

mkeintz · Posted 02-28-2018 11:36 PM

Here's a little amplification of @Astounding's suggestion, which produces both N and column frequencies. PROC TABULATE is not entirely intuitive, but addresses your request for combined tables:

data have;
  input atype b $  c sex $ e;
datalines;
1 X 1 M 1
1 X 2 M 1
1 Y 3 M 5
2 Y 1 F 2
2 C 2 F 3
3 D 3 M 3
3 X 1 M 3
3 Y 2 F 1
3 Y 3 F 1
4 D 1 M 2
4 X 2 F 2
5 C 3 F 2
5 C 1 F 1
5 D 2 F 1
5 D 3 F 1
5 D 1 M 1
run;

proc tabulate data=have noseps;
  class atype b c sex e;
  tables
    (b='VarB' c='VarC' sex='Sex' e='Var E')
     *(N*f=8.0 pctn<atype all>='%'*f=8.1)
	,
    (atype='A Type Pcts' all)
  /rts=12 row=float;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

jagnew · Posted 03-01-2018 09:01 AM

Hi @mkeintz, apologies if I'm being dumb but what exactly are you doing in the tables statement? Thanks

jagnew · Posted 03-01-2018 08:44 AM

Thanks Astounding. I think this is the right method/solution as it is close to what I need. However I am having issues with some of the variables only showing 1 value in the output - while some variables correctly show all their values....

Astounding · Posted 03-01-2018 09:21 AM

There are a couple of possibilities to investigate.

First, PROC TABULATE automatically removes observations that have a missing value for any of the CLASS variables. You would need to add the MISSING option to include those observations. It's particularly devious, because if variable A has a missing value, that observation is removed even for computing the distribution of variable B.

A second possibility to investigate: do any of the CLASS variables have formats attached to them? PROC CONTENTS would reveal that. Formats can group many values into a single row in the output table.

jagnew · Posted 03-01-2018 09:56 AM

Adding the MISSING option worked! Thank you, this has been very helpful

Classroom Training Available!