Hi,
I have a large dataset (3m records) with around 10,000 columns (variables).
I need to find the number of distinct values in each column as shown below:
From this:
| ID | Name | Num |
| 123 | last name | 10000 |
| 123 | last name | 20000 |
| 345 | s drop | 30000 |
| 456 | s drop | 40000 |
| 123 | s drop | 40000 |
To this:
| ID | 3 |
| Name | 2 |
| Num | 4 |
What is the most efficient way of generating these results for c. 10,000 columns?
Cheers
ods select none;
ods output nlevels=want;
proc freq data=sashelp.class nlevels ;
table _all_/missing;
run;
ods select all;
Try the below code to get the distinct value for the variables...
Proc sql;
select count(ID)as ID, count(Name)as Name, count(Num) as Num from dataset_name;
Quit;
Thanks, is there a quicker way than having a really long script with 10,000 vars?
ods select none;
ods output nlevels=want;
proc freq data=sashelp.class nlevels ;
table _all_/missing;
run;
ods select all;
Why does this work?
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.