turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- code to tell whether columns are continuous or not...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-03-2014 11:30 PM

Hi All,

I am working on a data-set which has more than 100 columns (variables) and writing a code to programmatically assign the column as categorical and continuous. While browsing, I found this paper "Continuous or Not: How One Can Tell" (Paper 88-28 - Vatsala Karwe, Mathematica Policy Research, Princeton, New Jersey - SUGI 28 - Coder's Corner). This is exactly what I want to do.

He mentions about vcont.sas file but was not able to get the code -- if anyone has already got this or done this - could you please let me know where I could download this, rather than re-inventing the wheel.

Thanks in advance..

Accepted Solutions

Solution

04-04-2014
12:33 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-04-2014 12:33 AM

Don't know about that vcont.sas code, but one way to tell if it is continuous is to look at the number of unique records in the variable. A categorical variable is only likely to have a few categories, but a continuous will have many. So you can set a cutoff to characterize a variable as categorical.

You can do this via proc freq, and change the macro variables for your dataset and your desired cutoff.

%let dset=sashelp.cars;

%let n_level_cutoff=20;

ods select nlevels;

ods table nlevels=n_levels;

**proc** **freq** data=&dset nlevels;

table _all_;

**run**;

**data** cont_var;

set n_levels;

if nlevels>&n_level_cutoff then vtype='Continuous';

else vtype='Categorical';

**run**;

All Replies

Solution

04-04-2014
12:33 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-04-2014 12:33 AM

Don't know about that vcont.sas code, but one way to tell if it is continuous is to look at the number of unique records in the variable. A categorical variable is only likely to have a few categories, but a continuous will have many. So you can set a cutoff to characterize a variable as categorical.

You can do this via proc freq, and change the macro variables for your dataset and your desired cutoff.

%let dset=sashelp.cars;

%let n_level_cutoff=20;

ods select nlevels;

ods table nlevels=n_levels;

**proc** **freq** data=&dset nlevels;

table _all_;

**run**;

**data** cont_var;

set n_levels;

if nlevels>&n_level_cutoff then vtype='Continuous';

else vtype='Categorical';

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-06-2014 02:38 PM

Hi Reeza,

Thank you so much - I was looking for similar code.

Since the dataset was huge -- I considered only 1 millions records for categorization.

@ballardw,

There were such columns and I did consider it the way you have mentioned. Thanks a lot

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-04-2014 11:00 AM

I would recommend looking at variable names and labels as well. There are many coding systems in use that have large numbers of codes such as Zip or other postal codes, account numbers, or client ID numbers that may not be detected by frequency counts especially if you have a dataset with many records. If you identify any such, exclude them from consideration.