SAS Programming

Wolverine · Posted 08-29-2024 01:13 PM

I'm working with data that has millions of case IDs. As I apply various criteria to them, I want to know how many cases remain after each step. What I'm currently doing is using PROC SQL to get a count and then running PROC FREQ on the count variable. It would be a lot faster, easier, and neater to eliminate the PROC SQL step and just use PROC FREQ on the "have" dataset. PROC FREQ already produces the info I want anyway, but I don't want it displaying millions of case IDs just to get a single number at the bottom of the Cumulative Frequency column. Is there any way to alter PROC FREQ's output to get just this number?

PROC SQL;
   Create table want_count
      as Select count (case_ID) as case_ID_ct
   From have;
QUIT;

PROC FREQ data=want_count; 
     TABLES case_ID_ct;
RUN;

Wolverine · Posted 08-29-2024 02:01 PM

I'm looking for a quick, efficient method to see how many cases remain after each criteria is added. I'm doing this many (15-20) times. If I can do it in the PROC FREQ step, that would eliminate ~20 SQLs. It's faster for me to write the code and it's faster to execute.

What I do like about my current method is that it produces a nice summary in the output so I can quickly and easily look at the counts change step-by-step. Looking up each file in order in the dictionary directory or opening a series of output files would be slower and clunkier.

Anyway, @ballardw already provided a great solution, but then apparently deleted it for some reason. This gives me a table in SAS's output window with one line that contains the variable name, the variable label, and the count of the unique values present. It's exactly what I'm looking for!

ODS select nlevels;
PROC FREQ data=have nlevels; TABLES case_ID; 
RUN;

View solution in original post

PaigeMiller · Posted 08-29-2024 01:31 PM

TABLES case_ID_ct/noprint out=_counts_ cumcol;

Then via any one of a dozen or so methods, find the last row in data set _COUNTS_ and work with that only.

Its not clear to me from your description if a "case" can have two or more records. If you just want the number of records in a data set, PROC FREQ is certainly the wrong way to go. You can extract this information from the DICTIONARY tables that SAS maintains.

--
Paige Miller

ballardw · Posted 08-29-2024 01:42 PM

I really don't understand the Proc Freq step at all.

Are you trying to avoid Proc SqL with Count because of the time it takes to execute? If the number you want is the number of observations in the data set then perhaps:

PROC SQL;
  select nobs
  From dictionary.tables
  where libname='WORK' and memname='HAVE'
  ;
QUIT;

SAS maintains views describing the current library and datasets (and options and more) that you can access in the Dictionary entrires. Dictionary.Tables has the basic details about data sets like names, labels creation and modification date and the number of observations.

The library and memname (data set name) are stored in upper case.

Wolverine · Posted 08-29-2024 02:01 PM

I'm looking for a quick, efficient method to see how many cases remain after each criteria is added. I'm doing this many (15-20) times. If I can do it in the PROC FREQ step, that would eliminate ~20 SQLs. It's faster for me to write the code and it's faster to execute.

What I do like about my current method is that it produces a nice summary in the output so I can quickly and easily look at the counts change step-by-step. Looking up each file in order in the dictionary directory or opening a series of output files would be slower and clunkier.

Anyway, @ballardw already provided a great solution, but then apparently deleted it for some reason. This gives me a table in SAS's output window with one line that contains the variable name, the variable label, and the count of the unique values present. It's exactly what I'm looking for!

ODS select nlevels;
PROC FREQ data=have nlevels; TABLES case_ID; 
RUN;

ballardw · Posted 08-29-2024 02:55 PM

I deleted the NLEVELS because I first thought you wanted the number of unique Case_id values in the data as useful but your description was "the cumulative count" from Proc freq. Those are different things.

Wolverine · Posted 08-29-2024 03:03 PM

Do u mean that they are different because nlevels shows the number of unique values of case ID, while "cumulative total" might be a higher number if some case IDs appear more than once? Sorry for not clarifying. In this situation, I do indeed want a count of unique case IDs.

Sarath_A_SAS · Posted 09-03-2024 09:26 PM

You can use the NOPRINT option in PROC FREQ along with the OUT= option to store the frequencies in a dataset and then view the total number of rows:

proc freq data=have noprint;
tables case_ID / out=freq_count;
run;

proc print data=freq_count (obs=1);
var count;
run;

NOPRINT: This suppresses the output of the frequency table.
OUT=freq_count: This creates a dataset freq_count that contains the frequencies.
PROC PRINT: This will print only the total count of case_ID.

SAS Programming

Is it possible to get Proc Freq to display total number of rows ONLY?

Re: Is it possible to get Proc Freq to display total number of rows ONLY?

Re: Is it possible to get Proc Freq to display total number of rows ONLY?

Re: Is it possible to get Proc Freq to display total number of rows ONLY?

Re: Is it possible to get Proc Freq to display total number of rows ONLY?

Re: Is it possible to get Proc Freq to display total number of rows ONLY?

Re: Is it possible to get Proc Freq to display total number of rows ONLY?

Re: Is it possible to get Proc Freq to display total number of rows ONLY?

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...