BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kjowers
Fluorite | Level 6

I feel like this should be really easy and obvious, but I can't figure it out. Any advice is appreciated!

 

It starts with just a simple crosstab. I need the output to be a list of values of x (number of events) for each level of y (regions), of which there are 13. There are 4 types of events, and I need a table to summarize each. Occasionally, there are 0 events in a region. However, I need the output to list all 13 regions. Is there some ridiculously simple solution that I've just overlooked? I tried the sparse option, and it didn't work.

 

proc freq data = have noprint;
    table region * num_events / nopercent norow nocol
    out = want;
run;

Thank you!!!!

Doctor J
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

SPARSE/LIST will not work in this case. You need to use PRELOADFMT or CLASSDATA instead.

If the data is not there, SAS cannot know it exists unless you tell it somehow. That 'somehow' is PRELOADFMT or CLASSDATA.

 

A quick search of those terms on here or lexjansen.com will illustrate fully worked examples that will show you how to modify your code/process.

 


@kjowers wrote:

Hi,

 

Attached are my data and syntax files. I've also pasted the syntax below. The issue is that I need the last 2 sheets on the exported .xls file to list all 13 regions, with 0 values as appropriate, rather than excluding the regions with 0 values. I can't seem to get the list/sparse combo to work - it outputs the total sample, not the subsetted samples.

 

/*** import current month's data ***/

proc import out = have_master
	datafile = 'W:\Have\have.xls' 
	dbms = excel replace;
	getnames = yes;
run;


/*** assign region ***/

data have_1;
	set have_master;

	format region 2.;
	informat region 2.;

	if county in (2)			then region = 1;
	if county in (30)			then region = 2;
	if county in (3)			then region = 3;
	if county in (16)			then region = 4;
	if county in (15)			then region = 5;
	if county in (13)			then region = 6;
	if county in (1,11,21)		then region = 7;
	if county in (5,9,14,17,20)	then region = 8;
	if county in (19,22,23)		then region = 9;
	if county in (4,8,18)		then region = 10;
	if county in (7,12)			then region = 11;
	if county in (10)			then region = 12;
	if county in (6)			then region = 13;

run;


/*** create output datasets ***/

	/** subset variables **/

data have_2;
	set have_1;

	format lh 1.;
	format lc 1.;
	format ulh 1.;
	format ulc 1.;

	if type in ('registered')	then lh = 1;
	if type in ('licensed')		then lc = 1;
	if type in ('illegal') and illegal_type in ('h')
								then ulh = 1;
	if type in ('illegal') and illegal_type in ('c')
								then ulc = 1;

run;

	/** tables **/

proc freq data = have_2;
	where type in ('registered');
		tables lh * region / nopercent norow nocol
			out = lh;
run;

proc freq data = have_2;
	where type in ('licensed');
		tables lc * region / nopercent norow nocol
			out = lc;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('h');
		tables ulh * region / nopercent norow nocol
			out = ulh;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('c');
		table ulc * region / nopercent norow nocol
			out = ulc;
run;


/*** export output to .xls ***/

proc export data = lh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lh';

proc export data = lc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lc';

proc export data = ulh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulh';

proc export data = ulc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulc';

run;

 

Thank you!!!!


 

View solution in original post

11 REPLIES 11
ballardw
Super User

Can you at least show what the result is expected to look like?

 

It actually is not "ridiculously easy" easy to display values that do not appear in your data.

 

Depending on what I wanted for appearance I would likely create a custom format to display your region values and then use one of the procedures that can use Preloadfmt option.

 

Since you are creating an output data set you could use the SPARSE option if you also use the LIST option on the tables statement.

kjowers
Fluorite | Level 6

Thanks, @ballardw - at least I don't feel quite so dumb.

 

Here's what I'm looking for at the minimum:

 

occ.PNG

If the output dataset ends up including percent, col, and row, etc., that's cool. I have a second .xls with formulas reading in the values in the num_events column, ordered by region, summarizing 2 decades' worth of this monthly data by year.

 

For some reason, when I use list and sparse like so:

 

proc freq data = have;
	where provider_type in ('Illegally-Operating');
		table region * num_events / list
			out = want sparse;
run;

 

this is my output.

 

occ2.PNG

 

What gives? 🙂

 

Thank you!!!

 

Keri

Doctor J
Reeza
Super User

Computers are dumb. If it's not in the data at all, how is SAS supposed to know it even exists? Obviously you need to tell it in some manner, and the way to specify that is using a FORMAT or a CLASS data set with ALL the levels/combinations needed. If you search PRELOADFMT and PROC TABULATE you'll see many ways of getting this done.

 

http://support.sas.com/resources/papers/proceedings11/087-2011.pdf

http://support.sas.com/resources/papers/proceedings11/239-2011.pdf

https://communities.sas.com/t5/Base-SAS-Programming/Proc-Tabulate-creating-a-row-of-zeros-for-nonexi...

 

Astounding
PROC Star

Does NUM_EVENTS really represent the type of event?  If not, then what is the right variable name?

 

Whatever the proper name, it's easier to add the missing categories in later.  I can't test this right now, but it should create a shell with all the categories:

 

proc sql;

create table shell as

select distinct a.region, distinct b.num_events from have a, have b;

quit;

 

My SQL isn't super strong, but that should work.  Then merge it back in with the PROC FREQ results:

 

proc sort data=shell;

by region num_events;

run;

data really_want;

merge want (in=in1) shell;

by region num_events;

if in1=0 then do;

   count=0;

   percent=0;

end;

run;

Ksharp
Super User

Better post some sample data to explain your question.

 

1)try SPARSE option

table region * num_events /sparse ;

2) try ZERO option

weight count/zero ;

kjowers
Fluorite | Level 6

Hi @Ksharp,

 

Thanks for your help!

 

Here's a sample of my data. There are other variables, but I don't use them here.

 

occ5.PNG

 

When I run:

 

proc freq data = have;
where type in ('illegal'); table region * num_events_1 / sparse out = want1; run; proc export data = want outfile = '\want.xls' dbms = excel replace; sheet = 'want1'; run;

this is my output:

 

occ6.PNG

 

The only problem is that the values in 'count' are for the entire 'have' dataset, not just the type = 'illegal' subset. Any idea why? I'm not familiar with how 'sparse' works.

 

Thanks!!!

Doctor J
Reeza
Super User

Please post your code AND data as text, otherwise you're asking us to type out your data and code.

 

There are instructions here on how to include your data as a data step so it can be used immediately in SAS

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

kjowers
Fluorite | Level 6

Thanks, @Reeza! Will do! That makes total sense, and I can see how that could be a royal pain. I'll attach my syntax and data as files to a previous reply - I'm having a hard time getting the syntax you linked to to work for me (it's definitely me, not the syntax). It looks really cool and helpful, so I'll definitely take more time later to try to figure out how it's working. Thanks again!!!

Doctor J
kjowers
Fluorite | Level 6

Hi,

 

Attached are my data and syntax files. I've also pasted the syntax below. The issue is that I need the last 2 sheets on the exported .xls file to list all 13 regions, with 0 values as appropriate, rather than excluding the regions with 0 values. I can't seem to get the list/sparse combo to work - it outputs the total sample, not the subsetted samples.

 

/*** import current month's data ***/

proc import out = have_master
	datafile = 'W:\Have\have.xls' 
	dbms = excel replace;
	getnames = yes;
run;


/*** assign region ***/

data have_1;
	set have_master;

	format region 2.;
	informat region 2.;

	if county in (2)			then region = 1;
	if county in (30)			then region = 2;
	if county in (3)			then region = 3;
	if county in (16)			then region = 4;
	if county in (15)			then region = 5;
	if county in (13)			then region = 6;
	if county in (1,11,21)		then region = 7;
	if county in (5,9,14,17,20)	then region = 8;
	if county in (19,22,23)		then region = 9;
	if county in (4,8,18)		then region = 10;
	if county in (7,12)			then region = 11;
	if county in (10)			then region = 12;
	if county in (6)			then region = 13;

run;


/*** create output datasets ***/

	/** subset variables **/

data have_2;
	set have_1;

	format lh 1.;
	format lc 1.;
	format ulh 1.;
	format ulc 1.;

	if type in ('registered')	then lh = 1;
	if type in ('licensed')		then lc = 1;
	if type in ('illegal') and illegal_type in ('h')
								then ulh = 1;
	if type in ('illegal') and illegal_type in ('c')
								then ulc = 1;

run;

	/** tables **/

proc freq data = have_2;
	where type in ('registered');
		tables lh * region / nopercent norow nocol
			out = lh;
run;

proc freq data = have_2;
	where type in ('licensed');
		tables lc * region / nopercent norow nocol
			out = lc;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('h');
		tables ulh * region / nopercent norow nocol
			out = ulh;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('c');
		table ulc * region / nopercent norow nocol
			out = ulc;
run;


/*** export output to .xls ***/

proc export data = lh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lh';

proc export data = lc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lc';

proc export data = ulh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulh';

proc export data = ulc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulc';

run;

 

Thank you!!!!

Doctor J
Reeza
Super User

SPARSE/LIST will not work in this case. You need to use PRELOADFMT or CLASSDATA instead.

If the data is not there, SAS cannot know it exists unless you tell it somehow. That 'somehow' is PRELOADFMT or CLASSDATA.

 

A quick search of those terms on here or lexjansen.com will illustrate fully worked examples that will show you how to modify your code/process.

 


@kjowers wrote:

Hi,

 

Attached are my data and syntax files. I've also pasted the syntax below. The issue is that I need the last 2 sheets on the exported .xls file to list all 13 regions, with 0 values as appropriate, rather than excluding the regions with 0 values. I can't seem to get the list/sparse combo to work - it outputs the total sample, not the subsetted samples.

 

/*** import current month's data ***/

proc import out = have_master
	datafile = 'W:\Have\have.xls' 
	dbms = excel replace;
	getnames = yes;
run;


/*** assign region ***/

data have_1;
	set have_master;

	format region 2.;
	informat region 2.;

	if county in (2)			then region = 1;
	if county in (30)			then region = 2;
	if county in (3)			then region = 3;
	if county in (16)			then region = 4;
	if county in (15)			then region = 5;
	if county in (13)			then region = 6;
	if county in (1,11,21)		then region = 7;
	if county in (5,9,14,17,20)	then region = 8;
	if county in (19,22,23)		then region = 9;
	if county in (4,8,18)		then region = 10;
	if county in (7,12)			then region = 11;
	if county in (10)			then region = 12;
	if county in (6)			then region = 13;

run;


/*** create output datasets ***/

	/** subset variables **/

data have_2;
	set have_1;

	format lh 1.;
	format lc 1.;
	format ulh 1.;
	format ulc 1.;

	if type in ('registered')	then lh = 1;
	if type in ('licensed')		then lc = 1;
	if type in ('illegal') and illegal_type in ('h')
								then ulh = 1;
	if type in ('illegal') and illegal_type in ('c')
								then ulc = 1;

run;

	/** tables **/

proc freq data = have_2;
	where type in ('registered');
		tables lh * region / nopercent norow nocol
			out = lh;
run;

proc freq data = have_2;
	where type in ('licensed');
		tables lc * region / nopercent norow nocol
			out = lc;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('h');
		tables ulh * region / nopercent norow nocol
			out = ulh;
run;

proc freq data = have_2;
	where type in ('illegal') and illegal_type in ('c');
		table ulc * region / nopercent norow nocol
			out = ulc;
run;


/*** export output to .xls ***/

proc export data = lh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lh';

proc export data = lc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'lc';

proc export data = ulh
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulh';

proc export data = ulc
	outfile = 'W:\Have\want.xls'
	dbms = excel replace;
	sheet = 'ulc';

run;

 

Thank you!!!!


 

kjowers
Fluorite | Level 6

Awesome! Thank you so much!!! I really appreciate your help!

Doctor J

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 7189 views
  • 1 like
  • 5 in conversation