Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Data Management
- /
- Forum
- /
- How to change a continuous variable to a categoric...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-02-2015 12:35 PM

Hello,

I was wanting to know how to make sure to change a continuous variable into a categorical variable for chi square analysis.

Would I use proc format or if then statements? I feel like if I use proc format I won't be able to do chi square statistics because it would make the numbers categorized into characters for a variable like age for example.

If then statements make a bit more sense renaming a new variable, in order to use in chi square analysis.

For example,

Data example1;

set example;

age=newage;

If age= 15-18 then newage= 1;

If age=19-27 then new age =2;

if age > 27 then new age=3 ;

run;

And new age would be the variable I would use in chi square analysis. Is this correct?

Any syntax missing from this code?

Thank you!!

Accepted Solutions

Solution

04-02-2015
12:54 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-02-2015 12:54 PM

Formats work. Your code seems to have an extra line whose purpose I don't seem to understand.

age=newage;

Seems to me to be extraneous.

You also need to use something like

if age>=15 and age<=18 then newage=1;

But to answer your question in a more broad sense, I know people have done this on occasion, but the idea of changing a continuous variable into a category just so you can perform a chi-squared test is inefficient. Better you should use all the information you have in your analysis, which requires you to treat age as a continuous variable which should provide a more powerful, and then the chi-squared test is inappropriate.

All Replies

Solution

04-02-2015
12:54 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-02-2015 12:54 PM

Formats work. Your code seems to have an extra line whose purpose I don't seem to understand.

age=newage;

Seems to me to be extraneous.

You also need to use something like

if age>=15 and age<=18 then newage=1;

But to answer your question in a more broad sense, I know people have done this on occasion, but the idea of changing a continuous variable into a category just so you can perform a chi-squared test is inefficient. Better you should use all the information you have in your analysis, which requires you to treat age as a continuous variable which should provide a more powerful, and then the chi-squared test is inappropriate.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-02-2015 02:22 PM

To expand on Paige's comments:

Create a custom format such as:

proc format;

value myagegroup

15 - 18 = '15 to 18 years'

19 - 27 = '19 to 27 years'

28 - high = '28 and older'

;

run;

Then in the code you are running the analysis on you don't need to create a new variable, use AGE and add as statement:

format age myagegroup. ;

This will create the groups to be used by the chi-square AND get useable value labels.

Note: if you have ages less than 15 then each age would appear as a separate group.

One nice thing about this approach is when someone asks "what happens if we use 15-23, 24-28 and then larger you only need to create a new format, not another variable. Also if you create output datasets such as with proc freq you will get one value, usually the lowest value that actually appears in the group for the formatted variable in the output.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2015 09:58 AM

Thank you both for your assistance

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2015 10:00 AM

Thanks so much for the detailed information. I appreciate it greatly!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-05-2015 10:01 AM

Thank you! I didn't know you could reduce the strength of the analysis by categorizing variables. I will look into other bivariate analysis tests. Thank you