Help using Base SAS procedures

Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

Accepted Solution Solved
Reply
Contributor
Posts: 29
Accepted Solution

Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

I feel like this should be really easy and obvious, but I can't figure it out. Any advice is appreciated!

 

It starts with just a simple crosstab. I need the output to be a list of values of x (number of events) for each level of y (regions), of which there are 13. There are 4 types of events, and I need a table to summarize each. Occasionally, there are 0 events in a region. However, I need the output to list all 13 regions. Is there some ridiculously simple solution that I've just overlooked? I tried the sparse option, and it didn't work.

 

proc freq data = have noprint;
    table region * num_events / nopercent norow nocol
    out = want;
run;

Thank you!!!!

Doctor J

Accepted Solutions
Solution
‎07-21-2017 05:50 PM
Super User
Posts: 17,863

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

SPARSE/LIST will not work in this case. You need to use PRELOADFMT or CLASSDATA instead.

If the data is not there, SAS cannot know it exists unless you tell it somehow. That 'somehow' is PRELOADFMT or CLASSDATA.

 

A quick search of those terms on here or lexjansen.com will illustrate fully worked examples that will show you how to modify your code/process.

 


kjowers wrote:

Hi,

 

Attached are my data and syntax files. I've also pasted the syntax below. The issue is that I need the last 2 sheets on the exported .xls file to list all 13 regions, with 0 values as appropriate, rather than excluding the regions with 0 values. I can't seem to get the list/sparse combo to work - it outputs the total sample, not the subsetted samples.

 

/*** import current month's data ***/

proc import out = have_master
	datafile = 'W:\Have\have.xls' 
	dbms = excel replace;
	getnames = yes;
run;


/*** assign region ***/

data have_1;
	set have_master;

	format region 2.;
	informat region 2.;

	if county in (2)			then region = 1;
	if county in (30)			then region = 2;
	if county in (3)			then region = 3;
	if county in (16)			then region = 4;
	if county in (15)			then region = 5;
	if county in (13)			then region = 6;
	if county in (1,11,21)		then region = 7;
	if county in (5,9,14,17,20)	then region = 8;
	if county in (19,22,23)		then region = 9;
	if county in (4,8,18)		then region = 10;
	if county in (7,12)			then region = 11;
	if county in (10)			then region = 12;
	if county in (6)			then region = 13;

run;


/*** create output datasets ***/

	/** subset variables **/

data have_2;
	set have_1;

	format lh 1.;
	format lc 1.;
	format ulh 1.;
	format ulc 1.;

	if type in ('registered')	then lh = 1;
	if type in ('licensed')		then lc = 1;
	if type in ('illegal') and illegal_type in ('h')
								then ulh = 1;
	if type in ('illegal') and illegal_type in ('c')
								then ulc = 1;

run;

	/** tables **/

proc freq data = have_2;
	where type in ('registered');
		tables lh * region / nopercent norow nocol
			out = lh;
run;

proc freq data = have_2;
	where type in ('licensed');
		tables lc * region / nopercent norow nocol
			out = lc;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('h');
		tables ulh * region / nopercent norow nocol
			out = ulh;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('c');
		table ulc * region / nopercent norow nocol
			out = ulc;
run;


/*** export output to .xls ***/

proc export data = lh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lh';

proc export data = lc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lc';

proc export data = ulh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulh';

proc export data = ulc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulc';

run;

 

Thank you!!!!


 

View solution in original post


All Replies
Super User
Posts: 10,516

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

Can you at least show what the result is expected to look like?

 

It actually is not "ridiculously easy" easy to display values that do not appear in your data.

 

Depending on what I wanted for appearance I would likely create a custom format to display your region values and then use one of the procedures that can use Preloadfmt option.

 

Since you are creating an output data set you could use the SPARSE option if you also use the LIST option on the tables statement.

Contributor
Posts: 29

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

Thanks, @ballardw - at least I don't feel quite so dumb.

 

Here's what I'm looking for at the minimum:

 

occ.PNG

If the output dataset ends up including percent, col, and row, etc., that's cool. I have a second .xls with formulas reading in the values in the num_events column, ordered by region, summarizing 2 decades' worth of this monthly data by year.

 

For some reason, when I use list and sparse like so:

 

proc freq data = have;
	where provider_type in ('Illegally-Operating');
		table region * num_events / list
			out = want sparse;
run;

 

this is my output.

 

occ2.PNG

 

What gives? Smiley Happy

 

Thank you!!!

 

Keri

Doctor J
Super User
Posts: 17,863

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

Computers are dumb. If it's not in the data at all, how is SAS supposed to know it even exists? Obviously you need to tell it in some manner, and the way to specify that is using a FORMAT or a CLASS data set with ALL the levels/combinations needed. If you search PRELOADFMT and PROC TABULATE you'll see many ways of getting this done.

 

http://support.sas.com/resources/papers/proceedings11/087-2011.pdf

http://support.sas.com/resources/papers/proceedings11/239-2011.pdf

https://communities.sas.com/t5/Base-SAS-Programming/Proc-Tabulate-creating-a-row-of-zeros-for-nonexi...

 

Super User
Posts: 5,085

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

Does NUM_EVENTS really represent the type of event?  If not, then what is the right variable name?

 

Whatever the proper name, it's easier to add the missing categories in later.  I can't test this right now, but it should create a shell with all the categories:

 

proc sql;

create table shell as

select distinct a.region, distinct b.num_events from have a, have b;

quit;

 

My SQL isn't super strong, but that should work.  Then merge it back in with the PROC FREQ results:

 

proc sort data=shell;

by region num_events;

run;

data really_want;

merge want (in=in1) shell;

by region num_events;

if in1=0 then do;

   count=0;

   percent=0;

end;

run;

Super User
Posts: 9,682

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

Better post some sample data to explain your question.

 

1)try SPARSE option

table region * num_events /sparse ;

2) try ZERO option

weight count/zero ;

Contributor
Posts: 29

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

Hi @Ksharp,

 

Thanks for your help!

 

Here's a sample of my data. There are other variables, but I don't use them here.

 

occ5.PNG

 

When I run:

 

proc freq data = have;
where type in ('illegal'); table region * num_events_1 / sparse out = want1; run; proc export data = want outfile = '\want.xls' dbms = excel replace; sheet = 'want1'; run;

this is my output:

 

occ6.PNG

 

The only problem is that the values in 'count' are for the entire 'have' dataset, not just the type = 'illegal' subset. Any idea why? I'm not familiar with how 'sparse' works.

 

Thanks!!!

Doctor J
Super User
Posts: 17,863

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

Please post your code AND data as text, otherwise you're asking us to type out your data and code.

 

There are instructions here on how to include your data as a data step so it can be used immediately in SAS

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Contributor
Posts: 29

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

Thanks, @Reeza! Will do! That makes total sense, and I can see how that could be a royal pain. I'll attach my syntax and data as files to a previous reply - I'm having a hard time getting the syntax you linked to to work for me (it's definitely me, not the syntax). It looks really cool and helpful, so I'll definitely take more time later to try to figure out how it's working. Thanks again!!!

Doctor J
Contributor
Posts: 29

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

[ Edited ]

Hi,

 

Attached are my data and syntax files. I've also pasted the syntax below. The issue is that I need the last 2 sheets on the exported .xls file to list all 13 regions, with 0 values as appropriate, rather than excluding the regions with 0 values. I can't seem to get the list/sparse combo to work - it outputs the total sample, not the subsetted samples.

 

/*** import current month's data ***/

proc import out = have_master
	datafile = 'W:\Have\have.xls' 
	dbms = excel replace;
	getnames = yes;
run;


/*** assign region ***/

data have_1;
	set have_master;

	format region 2.;
	informat region 2.;

	if county in (2)			then region = 1;
	if county in (30)			then region = 2;
	if county in (3)			then region = 3;
	if county in (16)			then region = 4;
	if county in (15)			then region = 5;
	if county in (13)			then region = 6;
	if county in (1,11,21)		then region = 7;
	if county in (5,9,14,17,20)	then region = 8;
	if county in (19,22,23)		then region = 9;
	if county in (4,8,18)		then region = 10;
	if county in (7,12)			then region = 11;
	if county in (10)			then region = 12;
	if county in (6)			then region = 13;

run;


/*** create output datasets ***/

	/** subset variables **/

data have_2;
	set have_1;

	format lh 1.;
	format lc 1.;
	format ulh 1.;
	format ulc 1.;

	if type in ('registered')	then lh = 1;
	if type in ('licensed')		then lc = 1;
	if type in ('illegal') and illegal_type in ('h')
								then ulh = 1;
	if type in ('illegal') and illegal_type in ('c')
								then ulc = 1;

run;

	/** tables **/

proc freq data = have_2;
	where type in ('registered');
		tables lh * region / nopercent norow nocol
			out = lh;
run;

proc freq data = have_2;
	where type in ('licensed');
		tables lc * region / nopercent norow nocol
			out = lc;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('h');
		tables ulh * region / nopercent norow nocol
			out = ulh;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('c');
		table ulc * region / nopercent norow nocol
			out = ulc;
run;


/*** export output to .xls ***/

proc export data = lh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lh';

proc export data = lc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lc';

proc export data = ulh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulh';

proc export data = ulc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulc';

run;

 

Thank you!!!!

Doctor J
Attachment
Solution
‎07-21-2017 05:50 PM
Super User
Posts: 17,863

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

SPARSE/LIST will not work in this case. You need to use PRELOADFMT or CLASSDATA instead.

If the data is not there, SAS cannot know it exists unless you tell it somehow. That 'somehow' is PRELOADFMT or CLASSDATA.

 

A quick search of those terms on here or lexjansen.com will illustrate fully worked examples that will show you how to modify your code/process.

 


kjowers wrote:

Hi,

 

Attached are my data and syntax files. I've also pasted the syntax below. The issue is that I need the last 2 sheets on the exported .xls file to list all 13 regions, with 0 values as appropriate, rather than excluding the regions with 0 values. I can't seem to get the list/sparse combo to work - it outputs the total sample, not the subsetted samples.

 

/*** import current month's data ***/

proc import out = have_master
	datafile = 'W:\Have\have.xls' 
	dbms = excel replace;
	getnames = yes;
run;


/*** assign region ***/

data have_1;
	set have_master;

	format region 2.;
	informat region 2.;

	if county in (2)			then region = 1;
	if county in (30)			then region = 2;
	if county in (3)			then region = 3;
	if county in (16)			then region = 4;
	if county in (15)			then region = 5;
	if county in (13)			then region = 6;
	if county in (1,11,21)		then region = 7;
	if county in (5,9,14,17,20)	then region = 8;
	if county in (19,22,23)		then region = 9;
	if county in (4,8,18)		then region = 10;
	if county in (7,12)			then region = 11;
	if county in (10)			then region = 12;
	if county in (6)			then region = 13;

run;


/*** create output datasets ***/

	/** subset variables **/

data have_2;
	set have_1;

	format lh 1.;
	format lc 1.;
	format ulh 1.;
	format ulc 1.;

	if type in ('registered')	then lh = 1;
	if type in ('licensed')		then lc = 1;
	if type in ('illegal') and illegal_type in ('h')
								then ulh = 1;
	if type in ('illegal') and illegal_type in ('c')
								then ulc = 1;

run;

	/** tables **/

proc freq data = have_2;
	where type in ('registered');
		tables lh * region / nopercent norow nocol
			out = lh;
run;

proc freq data = have_2;
	where type in ('licensed');
		tables lc * region / nopercent norow nocol
			out = lc;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('h');
		tables ulh * region / nopercent norow nocol
			out = ulh;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('c');
		table ulc * region / nopercent norow nocol
			out = ulc;
run;


/*** export output to .xls ***/

proc export data = lh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lh';

proc export data = lc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lc';

proc export data = ulh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulh';

proc export data = ulc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulc';

run;

 

Thank you!!!!


 

Contributor
Posts: 29

Re: Using proc freq for a crosstab (two-by-two); how do I include all levels of Y?

Awesome! Thank you so much!!! I really appreciate your help!

Doctor J
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 260 views
  • 1 like
  • 5 in conversation