BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Wolverine
Pyrite | Level 9

I'm working with data that has millions of case IDs. As I apply various criteria to them, I want to know how many cases remain after each step. What I'm currently doing is using PROC SQL to get a count and then running PROC FREQ on the count variable. It would be a lot faster, easier, and neater to eliminate the PROC SQL step and just use PROC FREQ on the "have" dataset. PROC FREQ already produces the info I want anyway, but I don't want it displaying millions of case IDs just to get a single number at the bottom of the Cumulative Frequency column. Is there any way to alter PROC FREQ's output to get just this number?

 

PROC SQL;
   Create table want_count
      as Select count (case_ID) as case_ID_ct
   From have;
QUIT;

PROC FREQ data=want_count; 
     TABLES case_ID_ct;
RUN;
1 ACCEPTED SOLUTION

Accepted Solutions
Wolverine
Pyrite | Level 9

I'm looking for a quick, efficient method to see how many cases remain after each criteria is added. I'm doing this many (15-20) times. If I can do it in the PROC FREQ step, that would eliminate ~20 SQLs. It's faster for me to write the code and it's faster to execute.

 

What I do like about my current method is that it produces a nice summary in the output so I can quickly and easily look at the counts change step-by-step. Looking up  each file in order in the dictionary directory or opening a series of output files would be slower and clunkier.

 

Anyway, @ballardw already provided a great solution, but then apparently deleted it for some reason. This gives me a table in SAS's output window with one line that contains the variable name, the variable label, and the count of the unique values present. It's exactly what I'm looking for!

ODS select nlevels;
PROC FREQ data=have nlevels; TABLES case_ID; 
RUN;

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26
TABLES case_ID_ct/noprint out=_counts_ cumcol;

Then via any one of a dozen or so methods, find the last row in data set _COUNTS_ and work with that only.

 

Its not clear to me from your description if a "case" can have two or more records. If you just want the number of records in a data set, PROC FREQ is certainly the wrong way to go. You can extract this information from the DICTIONARY tables that SAS maintains.

--
Paige Miller
ballardw
Super User

I really don't understand the Proc Freq step at all.

Are you trying to avoid Proc SqL with Count because of the time it takes to execute? If the number you want is the number of observations in the data set then perhaps:

PROC SQL;
  select nobs
  From dictionary.tables
  where libname='WORK' and memname='HAVE'
  ;
QUIT;

SAS maintains views describing the current library and datasets (and options and more) that you can access in the Dictionary entrires. Dictionary.Tables has the basic details about data sets like names, labels creation and modification date and the number of observations.

The library and memname (data set name) are stored in upper case.

 

Wolverine
Pyrite | Level 9

I'm looking for a quick, efficient method to see how many cases remain after each criteria is added. I'm doing this many (15-20) times. If I can do it in the PROC FREQ step, that would eliminate ~20 SQLs. It's faster for me to write the code and it's faster to execute.

 

What I do like about my current method is that it produces a nice summary in the output so I can quickly and easily look at the counts change step-by-step. Looking up  each file in order in the dictionary directory or opening a series of output files would be slower and clunkier.

 

Anyway, @ballardw already provided a great solution, but then apparently deleted it for some reason. This gives me a table in SAS's output window with one line that contains the variable name, the variable label, and the count of the unique values present. It's exactly what I'm looking for!

ODS select nlevels;
PROC FREQ data=have nlevels; TABLES case_ID; 
RUN;
ballardw
Super User

I deleted the NLEVELS because I first thought you wanted the number of unique Case_id values in the data as useful but your description was "the cumulative count" from Proc freq. Those are different things.

Wolverine
Pyrite | Level 9

Do u mean that they are different because nlevels shows the number of unique values of case ID, while "cumulative total" might be a higher number if some case IDs appear more than once? Sorry for not clarifying. In this situation, I do indeed want a count of unique case IDs.

Sarath_A_SAS
Obsidian | Level 7

You can use the NOPRINT option in PROC FREQ along with the OUT= option to store the frequencies in a dataset and then view the total number of rows:

proc freq data=have noprint;
tables case_ID / out=freq_count;
run;

proc print data=freq_count (obs=1);
var count;
run;

 

 

  • NOPRINT: This suppresses the output of the frequency table.
  • OUT=freq_count: This creates a dataset freq_count that contains the frequencies.
  • PROC PRINT: This will print only the total count of case_ID.

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 806 views
  • 0 likes
  • 4 in conversation