Solved
New Contributor
Posts: 3

# code to tell whether columns are continuous or not (vcont.sas)

Hi All,

I am working on a data-set which has more than 100 columns (variables) and writing a code to programmatically assign the column as categorical and continuous. While browsing, I found this paper "Continuous or Not: How One Can Tell" (Paper 88-28 - Vatsala Karwe, Mathematica Policy Research, Princeton, New Jersey - SUGI 28 - Coder's Corner). This is exactly what I want to do.

He mentions about vcont.sas file but was not able to get the code -- if anyone has already got this or done this - could you please let me know where I could download this, rather than re-inventing the wheel.

Accepted Solutions
Solution
‎04-04-2014 12:33 AM
Super User
Posts: 23,773

## Re: code to tell whether columns are continuous or not (vcont.sas)

Don't know about that vcont.sas code, but one way to tell if it is continuous is to look at the number of unique records in the variable. A categorical variable is only likely to have a few categories, but a continuous will have many. So you can set a cutoff to characterize a variable as categorical.

You can do this via proc freq, and change the macro variables for your dataset and your desired cutoff.

%let dset=sashelp.cars;

%let n_level_cutoff=20;

ods select nlevels;

ods table nlevels=n_levels;

proc freq data=&dset nlevels;

table _all_;

run;

data cont_var;

set n_levels;

if nlevels>&n_level_cutoff then vtype='Continuous';

else vtype='Categorical';

run;

All Replies
Solution
‎04-04-2014 12:33 AM
Super User
Posts: 23,773

## Re: code to tell whether columns are continuous or not (vcont.sas)

Don't know about that vcont.sas code, but one way to tell if it is continuous is to look at the number of unique records in the variable. A categorical variable is only likely to have a few categories, but a continuous will have many. So you can set a cutoff to characterize a variable as categorical.

You can do this via proc freq, and change the macro variables for your dataset and your desired cutoff.

%let dset=sashelp.cars;

%let n_level_cutoff=20;

ods select nlevels;

ods table nlevels=n_levels;

proc freq data=&dset nlevels;

table _all_;

run;

data cont_var;

set n_levels;

if nlevels>&n_level_cutoff then vtype='Continuous';

else vtype='Categorical';

run;

New Contributor
Posts: 3

## Re: code to tell whether columns are continuous or not (vcont.sas)

Hi Reeza,

Thank you so much - I was looking for similar code.

Since the dataset was huge -- I considered only 1 millions records for categorization.

@ballardw,

There were such columns and I did consider it the way you have mentioned. Thanks a lot

Super User
Posts: 13,583

## Re: code to tell whether columns are continuous or not (vcont.sas)

I would recommend looking at variable names and labels as well. There are many coding systems in use that have large numbers of codes such as Zip or other postal codes, account numbers, or client ID numbers that may not be detected by frequency counts especially if you have a dataset with many records. If you identify any such, exclude them from consideration.

🔒 This topic is solved and locked.