- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm working with data that has millions of case IDs. As I apply various criteria to them, I want to know how many cases remain after each step. What I'm currently doing is using PROC SQL to get a count and then running PROC FREQ on the count variable. It would be a lot faster, easier, and neater to eliminate the PROC SQL step and just use PROC FREQ on the "have" dataset. PROC FREQ already produces the info I want anyway, but I don't want it displaying millions of case IDs just to get a single number at the bottom of the Cumulative Frequency column. Is there any way to alter PROC FREQ's output to get just this number?
PROC SQL;
Create table want_count
as Select count (case_ID) as case_ID_ct
From have;
QUIT;
PROC FREQ data=want_count;
TABLES case_ID_ct;
RUN;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm looking for a quick, efficient method to see how many cases remain after each criteria is added. I'm doing this many (15-20) times. If I can do it in the PROC FREQ step, that would eliminate ~20 SQLs. It's faster for me to write the code and it's faster to execute.
What I do like about my current method is that it produces a nice summary in the output so I can quickly and easily look at the counts change step-by-step. Looking up each file in order in the dictionary directory or opening a series of output files would be slower and clunkier.
Anyway, @ballardw already provided a great solution, but then apparently deleted it for some reason. This gives me a table in SAS's output window with one line that contains the variable name, the variable label, and the count of the unique values present. It's exactly what I'm looking for!
ODS select nlevels;
PROC FREQ data=have nlevels; TABLES case_ID;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
TABLES case_ID_ct/noprint out=_counts_ cumcol;
Then via any one of a dozen or so methods, find the last row in data set _COUNTS_ and work with that only.
Its not clear to me from your description if a "case" can have two or more records. If you just want the number of records in a data set, PROC FREQ is certainly the wrong way to go. You can extract this information from the DICTIONARY tables that SAS maintains.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I really don't understand the Proc Freq step at all.
Are you trying to avoid Proc SqL with Count because of the time it takes to execute? If the number you want is the number of observations in the data set then perhaps:
PROC SQL; select nobs From dictionary.tables where libname='WORK' and memname='HAVE' ; QUIT;
SAS maintains views describing the current library and datasets (and options and more) that you can access in the Dictionary entrires. Dictionary.Tables has the basic details about data sets like names, labels creation and modification date and the number of observations.
The library and memname (data set name) are stored in upper case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm looking for a quick, efficient method to see how many cases remain after each criteria is added. I'm doing this many (15-20) times. If I can do it in the PROC FREQ step, that would eliminate ~20 SQLs. It's faster for me to write the code and it's faster to execute.
What I do like about my current method is that it produces a nice summary in the output so I can quickly and easily look at the counts change step-by-step. Looking up each file in order in the dictionary directory or opening a series of output files would be slower and clunkier.
Anyway, @ballardw already provided a great solution, but then apparently deleted it for some reason. This gives me a table in SAS's output window with one line that contains the variable name, the variable label, and the count of the unique values present. It's exactly what I'm looking for!
ODS select nlevels;
PROC FREQ data=have nlevels; TABLES case_ID;
RUN;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I deleted the NLEVELS because I first thought you wanted the number of unique Case_id values in the data as useful but your description was "the cumulative count" from Proc freq. Those are different things.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do u mean that they are different because nlevels shows the number of unique values of case ID, while "cumulative total" might be a higher number if some case IDs appear more than once? Sorry for not clarifying. In this situation, I do indeed want a count of unique case IDs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use the NOPRINT
option in PROC FREQ
along with the OUT=
option to store the frequencies in a dataset and then view the total number of rows:
proc freq data=have noprint;
tables case_ID / out=freq_count;
run;
proc print data=freq_count (obs=1);
var count;
run;
NOPRINT
: This suppresses the output of the frequency table.OUT=freq_count
: This creates a datasetfreq_count
that contains the frequencies.PROC PRINT
: This will print only the total count ofcase_ID
.