BookmarkSubscribeRSS Feed
teelov
Quartz | Level 8

i have an number of datasets, ranging from 1000 rows to 300,000,000, 10 variables to 256

 

on a proceedings paper i read that:

 

1 % - 15% An index will definitely improve program performance

16% - 20% An index will probably improve program performance

 

so with that in mind i need to ask the question and create some code which:

Returns a list of variables that meet the criteria where then number of distinct values on given variables that amount to 20% or less of the total row count of that table

 

I'm not sure how to approach this as it will take some stress on the server if i just use proc freq?

 

advice?

1 REPLY 1
ballardw
Super User

@teelov wrote:

i have an number of datasets, ranging from 1000 rows to 300,000,000, 10 variables to 256

 

on a proceedings paper i read that:

 

1 % - 15% An index will definitely improve program performance

16% - 20% An index will probably improve program performance

 

so with that in mind i need to ask the question and create some code which:

Returns a list of variables that meet the criteria where then number of distinct values on given variables that amount to 20% or less of the total row count of that table

 

I'm not sure how to approach this as it will take some stress on the server if i just use proc freq?

 

advice?


I would be tempted to start with proc freq with the nlevels option which reports the number of non-missing and missing levels for variables. Candidates would be those that have no missing levels and relatively large number of non-missing levels.

Example:

ods select nlevels;
Proc freq data=sashelp.class nlevels;
run;

The ods select only displays the neleves information. You can use the ODS OUTPUT to save that into a data set if needed.

 

I would look for variables that are more categorical  in nature, such as NAME in the above, than measured such as HEIGHT.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 873 views
  • 0 likes
  • 2 in conversation