I have a set of 145 numeric variables whose counts I want to compare period to period. For example, I might want to compare January of this year to January of last year or January of this year to, say, April of last year. Easy enough to get the counts using "one way" tables in PROC FREQ, but when displayed side by side, the rows don't necessarily line up. If a particular period doesn't have any instances in a given range, then PROC FREQ will omit that range.
Here's an example:
PROC FORMAT;
VALUE TV_MON_20YM
0 = '0 months'
1-2 = '1 to 2 months'
3-4 = '3 to 4 months'
5-6 = '5 to 6 months'
7-8 = '7 to 8 months'
9-10 = '9 to 10 months'
11-12 = '11 to 12 months'
13 - 99999 = '> 12 months'
;
RUN;
PROC FREQ DATA=Comp_Lib.&Comp_File;
TABLES TV_PS_PSA008 / MISSING;
FORMAT TV_PS_PSA008 TV_MON_20YM.;
RUN;
The most common problem we experience is that there will be no instances of '0 months' for one of the periods being compared. The rows then don't line up period to period.
Is there a way that I can have SAS print a '0 months' row even when there are no occurences that have a value of zero? The SPARSE option in PROC FREQ appears to be geared toward two way tables. The PRELOADFMT option appears to also be geared toward two way tables. I just have simple one-way tables. If someonr could point me in the right direction, that would be most helpful.
Thank you,
Jim
This is how I would do it.
PROC FORMAT;
VALUE TV_MON_20YM(notsorted)
0 = '0 months'
1-2 = '1 to 2 months'
3-4 = '3 to 4 months'
5-6 = '5 to 6 months'
7-8 = '7 to 8 months'
9-10 = '9 to 10 months'
11-12 = '11 to 12 months'
13 - 99999 = '> 12 months'
;
RUN;
data compfile;
TV_PS_PSA008=3;
run;
proc summary data=compfile nway completetypes;
class TV_PS_PSA008 / preloadfmt order=data missing;
FORMAT TV_PS_PSA008 TV_MON_20YM.;
output out=counts;
run;
proc print;
run;
PROC FREQ DATA=counts order=data;
TABLES TV_PS_PSA008 / MISSING;
weight _freq_ / zeros;
FORMAT TV_PS_PSA008 TV_MON_20YM.;
RUN;
@jimbarbour wrote:
The SPARSE option in PROC FREQ appears to be geared toward two way tables. The PRELOADFMT option appears to also be geared toward two way tables.
PRELOADFMT would definitely solve your issue, SPARSE may not helpful. Given your description of comparing things over time and periods, I'm not sure how you have a one way table. You need to post some more sample data that reflects your problem, this isn't enough to illustrate it beyond the standard use a PRELOADFMT.
This is how I would do it.
PROC FORMAT;
VALUE TV_MON_20YM(notsorted)
0 = '0 months'
1-2 = '1 to 2 months'
3-4 = '3 to 4 months'
5-6 = '5 to 6 months'
7-8 = '7 to 8 months'
9-10 = '9 to 10 months'
11-12 = '11 to 12 months'
13 - 99999 = '> 12 months'
;
RUN;
data compfile;
TV_PS_PSA008=3;
run;
proc summary data=compfile nway completetypes;
class TV_PS_PSA008 / preloadfmt order=data missing;
FORMAT TV_PS_PSA008 TV_MON_20YM.;
output out=counts;
run;
proc print;
run;
PROC FREQ DATA=counts order=data;
TABLES TV_PS_PSA008 / MISSING;
weight _freq_ / zeros;
FORMAT TV_PS_PSA008 TV_MON_20YM.;
RUN;
OK, @data_null__, that works. That gives me the results I need including the zero counts. It runs pretty fast with a couple of variables, but really really slow with more. I may have done something wrong there; not sure. If the slowness I'm noticing is just part and parcel of having 145 variables and about 800,000 - 1,000,000 records, then I can turn it into a macro or something and just put through a few variables at a time.
Jim
@Reeza, I'm just producing two sets of one-way frequency counts, one for the current period, one for a prior period. The one-way counts are then laid side-by-side for presentation purposes and Excel macros highlight any differences.
The solution that @data_null__ proposed is working, albeit slowly, so I won't post more detail at this juncture.
Thanks for your input,
Jim
@data_null__ wrote:
Ways 1;
##- Please type your reply above this line. Simple formatting, no
attachments. -##
Sorry, you lost me @data_null__
Do what?
Jim
@jimbarbour wrote:
@data_null__ wrote:
Ways 1;
##- Please type your reply above this line. Simple formatting, no
attachments. -##Sorry, you lost me @data_null__
Do what?
Jim
OK @jimbarbour WAYS 1 is what you need to get PROC SUMMARY to just do the 1-way tables.
Here is an example program that will handle up to 32767 class variables. The class variables can be either char or numeric as the MLF CLASS statement option converts all class variables to character so they can be arrayed. Then a second data step to normalize the class variables into _NAME_ and _VALUE_ and the class variables are dropped. This is a more manageable data in my opinion. This output can be passed to PROC FREQ with a BY statement if you like.
PROC FORMAT;
VALUE TV_MON_20YM(notsorted)
0 = '0 months'
1-2 = '1 to 2 months'
3-4 = '3 to 4 months'
5-6 = '5 to 6 months'
7-8 = '7 to 8 months'
9-10 = '9 to 10 months'
11-12 = '11 to 12 months'
13 - 99999 = '> 12 months'
;
RUN;
data compfile;
do TV_PS_PSA008=3,.;
TV_PS_PSA006=TV_PS_PSA008;
TV_PS_PSA010=TV_PS_PSA008;
output;
end;
run;
proc summary data=compfile completetypes chartype;
class TV_PS_PSA: / preloadfmt order=data missing mlf;
FORMAT TV_PS_PSA: TV_MON_20YM.;
ways 1;
output out=counts;
run;
data counts;
length _order_ 8 _name_ $32 _value_ $64;
set counts;
array tv[*] TV_PS_PSA:;
drop tv:;
_I_ = indexc(_type_,'1');
_order_ = length(_type_)-_i_;
_name_ = vname(tv[_i_]);
_value_ = tv[_i_];
run;
proc print;
run;
proc freq data=counts order=data;
by _order_ _name_;
tables _value_;
weight _freq_ / zeros;
run;
The slowness is probably from generating the results, try turning on the NOPRINT option or the listing output.
What appears to be happening in the PROC SUMMARY, before I added the WAYS 1 parameter at @data_null__'s suggestion, was that, in the output data set, each variable's format categories were repeated for each category of each format for the preceeding variable which were in turn repeated for each category of the variable preceeding that, and so on, winding up with something that looks suspiciously like a Cartesian product.
With just 5 variables with 20ish format categories each, I wound up with more than 11,000,000 combinations. It's hardly a wonder that it wouldn't work with 145 variables.
Jim
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.