Solved: Distinct values for all columns in dataset

PetePatel · Posted 01-29-2019 06:22 AM

Hi,

I have a large dataset (3m records) with around 10,000 columns (variables).

I need to find the number of distinct values in each column as shown below:

From this:

To this:

What is the most efficient way of generating these results for c. 10,000 columns?

Cheers

Ksharp · Posted 01-29-2019 08:42 AM

ods select none;
ods output nlevels=want;
proc freq data=sashelp.class nlevels ;
table _all_/missing;
run;
ods select all;

Sathish_jammy · Posted 01-29-2019 07:26 AM

Try the below code to get the distinct value for the variables...

Proc sql;
select count(ID)as ID, count(Name)as Name, count(Num) as Num from dataset_name;
Quit;

PetePatel · Posted 01-29-2019 08:22 AM

Thanks, is there a quicker way than having a really long script with 10,000 vars?

Ksharp · Posted 01-29-2019 08:42 AM

ods select none;
ods output nlevels=want;
proc freq data=sashelp.class nlevels ;
table _all_/missing;
run;
ods select all;

jeffharris · Posted 04-01-2020 01:32 PM

Why does this work?

Ksharp · Posted 04-02-2020 07:15 AM

Did you run the code and check WANT table ?

Var NLevels
Name 19
Sex 2
Age 6
Height 17
Weight 15

Distinct values for all columns in dataset