BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jimbarbour
Meteorite | Level 14

I have a set of 145 numeric variables whose counts I want to compare period to period.  For example, I might want to compare January of this year to January of last year or January of this year to, say, April of last year.  Easy enough to get the counts using "one way" tables in PROC FREQ, but when displayed side by side, the rows don't necessarily line up.  If a particular period doesn't have any instances in a given range, then PROC FREQ will omit that range.

 

Here's an example:

PROC FORMAT;
    VALUE TV_MON_20YM 
	0 = '0 months'
	1-2 = '1 to 2 months'
	3-4 = '3 to 4 months'
	5-6 = '5 to 6 months'
	7-8 = '7 to 8 months'
	9-10 = '9 to 10 months'
	11-12 = '11 to 12 months'
	13 - 99999 = '> 12 months'
	;

RUN;

PROC	FREQ	DATA=Comp_Lib.&Comp_File;
	TABLES	TV_PS_PSA008	/	MISSING;
	FORMAT	TV_PS_PSA008	TV_MON_20YM.;
RUN;

The most common problem we experience is that there will be no instances of '0 months' for one of the periods being compared. The rows then don't line up period to period.  

 

Is there a way that I can have SAS print a '0 months' row even when there are no occurences that have a value of zero?  The SPARSE option in PROC FREQ appears to be geared toward two way tables.  The PRELOADFMT option appears to also be geared toward two way tables.  I just have simple one-way tables.  If someonr could point me in the right direction, that would be most helpful.

 

Thank you,

 

Jim

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

This is how I would do it.

 

PROC FORMAT;
   VALUE TV_MON_20YM(notsorted)
      0 = '0 months'
      1-2 = '1 to 2 months'
      3-4 = '3 to 4 months'
      5-6 = '5 to 6 months'
      7-8 = '7 to 8 months'
      9-10 = '9 to 10 months'
      11-12 = '11 to 12 months'
      13 - 99999 = '> 12 months'
   ;
   RUN;
data compfile;
   TV_PS_PSA008=3;
   run;
proc summary data=compfile nway completetypes;
   class TV_PS_PSA008 / preloadfmt order=data missing;
   FORMAT	TV_PS_PSA008	TV_MON_20YM.;
   output out=counts;
   run;
proc print;
   run;
PROC	FREQ	DATA=counts order=data;
   TABLES	TV_PS_PSA008	/	MISSING;
   weight _freq_ / zeros;
   FORMAT	TV_PS_PSA008	TV_MON_20YM.;
   RUN;

Capture.PNG 

View solution in original post

10 REPLIES 10
Reeza
Super User

@jimbarbour wrote:

 The SPARSE option in PROC FREQ appears to be geared toward two way tables.  The PRELOADFMT option appears to also be geared toward two way tables.  


PRELOADFMT would definitely solve your issue, SPARSE may not helpful. Given your description of comparing things over time and periods, I'm not sure how you have a one way table. You need to post some more sample data that reflects your problem, this isn't enough to illustrate it beyond the standard use a PRELOADFMT. 

 

 

data_null__
Jade | Level 19

This is how I would do it.

 

PROC FORMAT;
   VALUE TV_MON_20YM(notsorted)
      0 = '0 months'
      1-2 = '1 to 2 months'
      3-4 = '3 to 4 months'
      5-6 = '5 to 6 months'
      7-8 = '7 to 8 months'
      9-10 = '9 to 10 months'
      11-12 = '11 to 12 months'
      13 - 99999 = '> 12 months'
   ;
   RUN;
data compfile;
   TV_PS_PSA008=3;
   run;
proc summary data=compfile nway completetypes;
   class TV_PS_PSA008 / preloadfmt order=data missing;
   FORMAT	TV_PS_PSA008	TV_MON_20YM.;
   output out=counts;
   run;
proc print;
   run;
PROC	FREQ	DATA=counts order=data;
   TABLES	TV_PS_PSA008	/	MISSING;
   weight _freq_ / zeros;
   FORMAT	TV_PS_PSA008	TV_MON_20YM.;
   RUN;

Capture.PNG 

jimbarbour
Meteorite | Level 14

OK, @data_null__, that works. That gives me the results I need including the zero counts.  It runs pretty fast with a couple of variables, but really really slow with more.  I may have done something wrong there; not sure. If the slowness I'm noticing is just part and parcel of having 145 variables and about 800,000 - 1,000,000 records, then I can turn it into a macro or something and just put through a few variables at a time.

 

Jim

jimbarbour
Meteorite | Level 14

@Reeza, I'm just producing two sets of one-way frequency counts, one for the current period, one for a prior period.  The one-way counts are then laid side-by-side for presentation purposes and Excel macros highlight any differences.

 

The solution that @data_null__ proposed is working, albeit slowly, so I won't post more detail at this juncture.

 

Thanks for your input,

 

Jim

data_null__
Jade | Level 19
Ways 1;

##- Please type your reply above this line. Simple formatting, no
attachments. -##
jimbarbour
Meteorite | Level 14

@data_null__ wrote:
Ways 1;

##- Please type your reply above this line. Simple formatting, no
attachments. -##

Sorry, you lost me @data_null__

 

Do what?

 

Jim

data_null__
Jade | Level 19

@jimbarbour wrote:

@data_null__ wrote:
Ways 1;

##- Please type your reply above this line. Simple formatting, no
attachments. -##

Sorry, you lost me @data_null__

 

Do what?

 

Jim


OK @jimbarbour WAYS 1 is what you need to get PROC SUMMARY to just do the 1-way tables.

 

http://support.sas.com/documentation/cdl/en/proc/69850/HTML/default/viewer.htm#n1affq2dctdc8un1eokb5...

 

Here is an example program that will handle up to 32767 class variables.  The class variables can be either char or numeric as the MLF CLASS statement option converts all class variables to character so they can be arrayed. Then a second data step to normalize the class variables into _NAME_ and _VALUE_ and the class variables are dropped.  This is a more manageable data in my opinion.  This output can be passed to PROC FREQ with a BY statement if you like.

 

PROC FORMAT;
   VALUE TV_MON_20YM(notsorted)
      0 = '0 months'
      1-2 = '1 to 2 months'
      3-4 = '3 to 4 months'
      5-6 = '5 to 6 months'
      7-8 = '7 to 8 months'
      9-10 = '9 to 10 months'
      11-12 = '11 to 12 months'
      13 - 99999 = '> 12 months'
   ;
   RUN;
data compfile;
   do TV_PS_PSA008=3,.;
      TV_PS_PSA006=TV_PS_PSA008;
      TV_PS_PSA010=TV_PS_PSA008;
      output;
      end;
   run;
proc summary data=compfile completetypes chartype;
   class TV_PS_PSA: / preloadfmt order=data missing mlf;
   FORMAT TV_PS_PSA:	TV_MON_20YM.;
   ways 1;
   output out=counts;
   run;
data counts;
   length _order_ 8 _name_ $32 _value_ $64;
   set counts;
   array tv[*] TV_PS_PSA:;
   drop tv:;
   _I_ = indexc(_type_,'1');
   _order_ = length(_type_)-_i_;
   _name_  = vname(tv[_i_]);
   _value_ = tv[_i_];
   run;
proc print;
   run;
proc freq data=counts order=data;
   by _order_ _name_;
   tables _value_;
   weight _freq_ / zeros;
   run;

Capture.PNG

Reeza
Super User

The slowness is probably from generating the results, try turning on the NOPRINT option or the listing output. 

jimbarbour
Meteorite | Level 14

@Reeza,  Ah.  Good idea.  I will try that.

 

Thank you,

 

Jim

jimbarbour
Meteorite | Level 14

@Reeza,

 

What appears to be happening in the PROC SUMMARY, before I added the WAYS 1 parameter at @data_null__'s suggestion, was that, in the output data set, each variable's format categories were repeated for each category of each format for the preceeding variable which were in turn repeated for each category of the variable preceeding that, and so on, winding up with something that looks suspiciously like a Cartesian product.  

 

With just 5 variables with 20ish format categories each, I wound up with more than 11,000,000 combinations.  It's hardly a wonder that it wouldn't work with 145 variables.

 

Jim


Screen_Print_of_Cartesian_Product_of_Formats.jpg

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 24842 views
  • 4 likes
  • 3 in conversation