code to tell whether columns are continuous or not (vcont.sas)

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

code to tell whether columns are continuous or not (vcont.sas)

Hi All,

I am working on a data-set which has more than 100 columns (variables) and writing a code to programmatically assign the column as categorical and continuous. While browsing, I found this paper "Continuous or Not: How One Can Tell" (Paper 88-28 - Vatsala Karwe, Mathematica Policy Research, Princeton, New Jersey - SUGI 28 - Coder's Corner). This is exactly what I want to do.

He mentions about vcont.sas file but was not able to get the code -- if anyone has already got this or done this - could you please let me know where I could download this, rather than re-inventing the wheel.

Thanks in advance..


Accepted Solutions
Solution
‎04-04-2014 12:33 AM
Super User
Posts: 19,767

Re: code to tell whether columns are continuous or not (vcont.sas)

Don't know about that vcont.sas code, but one way to tell if it is continuous is to look at the number of unique records in the variable. A categorical variable is only likely to have a few categories, but a continuous will have many. So you can set a cutoff to characterize a variable as categorical.

You can do this via proc freq, and change the macro variables for your dataset and your desired cutoff.

%let dset=sashelp.cars;

%let n_level_cutoff=20;

ods select nlevels;

ods table nlevels=n_levels;

proc freq data=&dset nlevels;

table _all_;

run;

data cont_var;

  set n_levels;

  if nlevels>&n_level_cutoff then vtype='Continuous';

  else vtype='Categorical';

run;

View solution in original post


All Replies
Solution
‎04-04-2014 12:33 AM
Super User
Posts: 19,767

Re: code to tell whether columns are continuous or not (vcont.sas)

Don't know about that vcont.sas code, but one way to tell if it is continuous is to look at the number of unique records in the variable. A categorical variable is only likely to have a few categories, but a continuous will have many. So you can set a cutoff to characterize a variable as categorical.

You can do this via proc freq, and change the macro variables for your dataset and your desired cutoff.

%let dset=sashelp.cars;

%let n_level_cutoff=20;

ods select nlevels;

ods table nlevels=n_levels;

proc freq data=&dset nlevels;

table _all_;

run;

data cont_var;

  set n_levels;

  if nlevels>&n_level_cutoff then vtype='Continuous';

  else vtype='Categorical';

run;

New Contributor
Posts: 3

Re: code to tell whether columns are continuous or not (vcont.sas)

Hi Reeza,

Thank you so much - I was looking for similar code. Smiley Happy

Since the dataset was huge -- I considered only 1 millions records for categorization.

@ballardw,

There were such columns and I did consider it the way you have mentioned. Thanks a lot Smiley Happy

Super User
Posts: 11,336

Re: code to tell whether columns are continuous or not (vcont.sas)

I would recommend looking at variable names and labels as well. There are many coding systems in use that have large numbers of codes such as Zip or other postal codes, account numbers, or client ID numbers that may not be detected by frequency counts especially if you have a dataset with many records. If you identify any such, exclude them from consideration.


🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 819 views
  • 3 likes
  • 3 in conversation