Desktop productivity for business analysts and programmers

sorting in Proc Freq

Reply
N/A
Posts: 0

sorting in Proc Freq

Hi. I am using Proc Freq to get the number of counts for a character field. The field contains the text values -3.0-<0.0, 0.0-<2.0, 2.0-
I cannot creat a Proc Format for this as the values may vary and may not always fall within the predefined list. I have also tried changing Low to -9Low in a Data step which works, but it's processing intensive and not ideal. Any other ideas?
SAS Super FREQ
Posts: 8,818

Re: sorting in Proc Freq

Hi:
If the data are already in ORDER, then PROC FREQ will use the ORDER=DATA option to keep the data in that order. PROC FREQ only ORDERs the results based on the criteria you give it...ORDER=DATA, ORDER=FREQ, etc. It does not otherwise "sort" your data. There is a SAS procedure, PROC SORT, that does sorting. There are other procedures that would give you similar statistics, and those procedures do not need a sort, because they do their own sorting (PROC SQL, PROC TABULATE, PROC REPORT, PROC MEANS) -- but you may not easily get cumulative statistics such as cum freq or cum percent from these procedures.

If the data are NOT in that ORDER (which I suspect might be the case), then you would have to assign an ordering variable (if you don't want a format), do a PROC SORT on that ORDERING variable -- so you know that the data are in YOUR desired ORDER and then use PROC FREQ with the ORDER=DATA option.

Here's an example of how PROC FREQ works if the data are already in the order you want:

[pre]
data testord;
length cat $12 name $10;
infile datalines;
input cat $ name $;
return;
datalines;
Low-<-3.0 fred
Low-<-3.0 kermit
Low-<-3.0 george
-3.0-<0.0 ethel
-3.0-<0.0 gonzo
-3.0-<0.0 elroy
0.0-<2.0 lucy
0.0-<2.0 bigbird
0.0-<2.0 jane
2.0-<High ricky
2.0-<High elmo
2.0-<High judy;
run;

proc freq data=testord order=data;
tables cat;
run;
[/pre]

Produces:
[pre]
Cumulative Cumulative
cat Frequency Percent Frequency Percent
--------------------------------------------------------------
Low-<-3.0 2 33.33 2 33.33
-3.0-<0.0 1 16.67 3 50.00
0.0-<2.0 2 33.33 5 83.33
2.0-<High 1 16.67 6 100.00
[/pre]

There is a way to create a format from a SAS dataset, however, it sounds like you're trying to avoid formats.

One possible thing to do is to use the 2nd part of your string as the ordering piece...in this particular sequence of values - in -3 would sort before 0, which would sort before 2 which would sort before the H in High -- this is a sort of tricky data solution, because it depends on the 2nd values in your character variable to sort in collating sequence. That's what is shown in the example below. The SCAN function is extracting the part of the CAT character variable after the < sign and putting it into a new variable called ORDPIECE.

With this data (not in order) and this DATA step program to create an ORDPIECE variable on which to sort, then the PROC FREQ produces the same results as those shown above. If you knew that this technique worked for your particular character values, then you would get rid of the DATALINES section and the INFILE/INPUT statements and use a SET statement instead to point to your SAS dataset. The temporary file created, with the ORDPIECE variable, would be the one you would use for PROC FREQ.

cynthia
[pre]
data testord2;
length cat $12 name $10;
infile datalines;
input cat $ name $;
ordpiece = left(scan(cat,2,'<'));
return;
datalines;
Low-<-3.0 fred
-3.0-<0.0 ethel
0.0-<2.0 lucy
2.0-<High ricky
Low-<-3.0 kermit
-3.0-<0.0 gonzo
0.0-<2.0 bigbird
2.0-<High elmo
Low-<-3.0 george
-3.0-<0.0 elroy
0.0-<2.0 jane
2.0-<High judy
;
run;

proc print data=testord2;
title 'what is ordpiece';
run;

proc sort data=testord2;
by ordpiece;
run;

proc freq data=testord2 order=data;
title 'after sort';
tables cat;
run;
[/pre]
Ask a Question
Discussion stats
  • 1 reply
  • 195 views
  • 0 likes
  • 2 in conversation