DATA Step, Macro, Functions and more

Simple distinct

Reply
Super Contributor
Posts: 673

Simple distinct

Hi All,
I have a small question.
Say I need to find the count of distinct values for each of the following in a dataset that has
more than a million records for my analysis:
complaint_id
indicator
dataentry_date
assignment_reassignment_date
complainant_type
contract_id
complaint_cat_id
category_description
complaint_subcat_id
subcategory_description

proc sql;
select count(distinct compliant_id) from table1;
quit;

I will have to run this multiple times and it will take time.I'm wondering if there is any way i can accomplish this in one step?
Trusted Advisor
Posts: 2,116

Re: Simple distinct

One way that only passes the big dataset once is to use PROC FREQ and output the one-way frequencies. You'll need to use one TABLES statement (with the OUT= option) for each variable. The record count for each output dataset is the number of unique values for each variable. Not elegant, but effective and reasonably efficient.

Doc Muhlbaier
Duke
Super User
Posts: 5,433

Re: Simple distinct

If you like you can use SQL as well, just add a count distinct for each of your columns. Not sure though how PROC SQL will process this internally regarding to efficiency.

/Linus
Data never sleeps
Contributor
Posts: 23

Re: Simple distinct

I just ran into this thread while looking for something else. Just for the record (almost certainly too late for the original poster), there is a very easy way to do this -- the NLEVELS option of PROC FREQ:

ods listing close;
ods output NLevels=CountOfDistinctValues;

proc freq data=sashelp.class nlevels;
tables _all_;

run;
ods output close;
ods listing;
Ask a Question
Discussion stats
  • 3 replies
  • 133 views
  • 0 likes
  • 4 in conversation