🔒 This topic is solved and locked.
Need further help from the community? Please
sign in and ask a new question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 01-29-2019 06:22 AM
(17760 views)
Hi,
I have a large dataset (3m records) with around 10,000 columns (variables).
I need to find the number of distinct values in each column as shown below:
From this:
ID | Name | Num |
123 | last name | 10000 |
123 | last name | 20000 |
345 | s drop | 30000 |
456 | s drop | 40000 |
123 | s drop | 40000 |
To this:
ID | 3 |
Name | 2 |
Num | 4 |
What is the most efficient way of generating these results for c. 10,000 columns?
Cheers
1 ACCEPTED SOLUTION
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ods select none;
ods output nlevels=want;
proc freq data=sashelp.class nlevels ;
table _all_/missing;
run;
ods select all;
5 REPLIES 5
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try the below code to get the distinct value for the variables...
Proc sql;
select count(ID)as ID, count(Name)as Name, count(Num) as Num from dataset_name;
Quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, is there a quicker way than having a really long script with 10,000 vars?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ods select none;
ods output nlevels=want;
proc freq data=sashelp.class nlevels ;
table _all_/missing;
run;
ods select all;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Why does this work?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Did you run the code and check WANT table ?
Var NLevels
Name 19
Sex 2
Age 6
Height 17
Weight 15
Var NLevels
Name 19
Sex 2
Age 6
Height 17
Weight 15