BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello friends,

I have a very big data set with 250 million rows and 20 variables.

I would like to create a data set with the following information:

For each variable that have less then 100 categories I want to see how many rows are in each category (also show category of missing value)

The wanted data set will have the following structure:

 

Ronein_0-1683280336742.png

May anyone show a nice code that do it?

Please remember that my raw data set  is very big as I mentioned before

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

A place to start:

ods select nlevels;
proc freq data=yourdatasetnamegoeshere nlevels;
  ods output nlevels=levelsdata;
run;

This will create a data set with the total number of levels, the number of missing levels (different missing values) and number of nonmissing levels.

I suspect that what you mean by "less than 100 categories" is looking for the NNonMissLevels variable in the created set unless "(also show category of missing value" means count the missing as well.

 

 

You can use the information from the Nlevels data to select variables as one of the variables in the table is TableVar that will have, with this example, the name of the variable. So filter where whichever definition of category fits to get the names of the variables.

 

Here's an example using the SASHELP.Class data set you should have to test code:

ods select nlevels;
proc freq data=sashelp.class nlevels;
  ods output nlevels=levelsdata;
run;

proc sql noprint;
   select tablevar into :varlist separated by ' '
   from levelsdata
   where nlevels le 100
   ;
quit;


proc freq data=sashelp.class;
  ods output onewayfreqs= rowcount;
  tables &varlist.; 
run;

data want; 
   set rowcount (rename=( frequency = NrRows));
   name= scan(table,2);
   category = left(coalescec( of f_:));
   keep name category nrrows;
run;

Caveat: the Category variable may need to have a length defined prior to use if YOUR values exceed 200 characters in length.

The  Proc SQL is one way to get a list of values from a data set into a macro variable that can be used elsewhere, such as a Tables statement in the example.

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

PROC FREQ with the LIST option ought to do this.

 

The fact that it is a huge table doesn't change the fact that you will have to run PROC FREQ to get that information. (Or possibly use hash objects)

--
Paige Miller
Ronein
Meteorite | Level 14
MAy you please show the code on sashelp.class for example? May you please get list on variables with not more then 100 categories and then perorm the required fruency only on them and put the information in one data set?
PaigeMiller
Diamond | Level 26

Surely you can write PROC FREQ code yourself

 

--
Paige Miller
ballardw
Super User

A place to start:

ods select nlevels;
proc freq data=yourdatasetnamegoeshere nlevels;
  ods output nlevels=levelsdata;
run;

This will create a data set with the total number of levels, the number of missing levels (different missing values) and number of nonmissing levels.

I suspect that what you mean by "less than 100 categories" is looking for the NNonMissLevels variable in the created set unless "(also show category of missing value" means count the missing as well.

 

 

You can use the information from the Nlevels data to select variables as one of the variables in the table is TableVar that will have, with this example, the name of the variable. So filter where whichever definition of category fits to get the names of the variables.

 

Here's an example using the SASHELP.Class data set you should have to test code:

ods select nlevels;
proc freq data=sashelp.class nlevels;
  ods output nlevels=levelsdata;
run;

proc sql noprint;
   select tablevar into :varlist separated by ' '
   from levelsdata
   where nlevels le 100
   ;
quit;


proc freq data=sashelp.class;
  ods output onewayfreqs= rowcount;
  tables &varlist.; 
run;

data want; 
   set rowcount (rename=( frequency = NrRows));
   name= scan(table,2);
   category = left(coalescec( of f_:));
   keep name category nrrows;
run;

Caveat: the Category variable may need to have a length defined prior to use if YOUR values exceed 200 characters in length.

The  Proc SQL is one way to get a list of values from a data set into a macro variable that can be used elsewhere, such as a Tables statement in the example.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 313 views
  • 1 like
  • 3 in conversation