I have a small question.
Say I need to find the count of distinct values for each of the following in a dataset that has
more than a million records for my analysis:
select count(distinct compliant_id) from table1;
I will have to run this multiple times and it will take time.I'm wondering if there is any way i can accomplish this in one step?
One way that only passes the big dataset once is to use PROC FREQ and output the one-way frequencies. You'll need to use one TABLES statement (with the OUT= option) for each variable. The record count for each output dataset is the number of unique values for each variable. Not elegant, but effective and reasonably efficient.
I just ran into this thread while looking for something else. Just for the record (almost certainly too late for the original poster), there is a very easy way to do this -- the NLEVELS option of PROC FREQ: