turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- how to create a dummy variable for an existing var...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-11-2016 09:38 PM - edited 09-11-2016 10:23 PM

Hi,

I have a data with over 20 variables and over a 100 rows of data.

For each variable I would like to create a dummy variable which depends on the median of the values of that variable. Here is an example to illustrate:

I have a column named "age" containing ages. I would like to calculate the median age, and suppose it equals 29, so if for an entry the value for age is greater than the median, say 35, then the new variable "age_dummy" will be equal to 1, and if age is less than or equal to the median, like 25, than the dummy will be equal to 0.

so if the initial data is something like this:

age |

25 |

29 |

31 |

35 |

Then I would like the final data to be of the following:

age | age_dummy |

25 | 0 |

29 | 0 |

31 | 1 |

35 | 1 |

Thank you!

Accepted Solutions

Solution

09-11-2016
11:59 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

09-11-2016 10:25 PM - edited 09-11-2016 10:27 PM

Use the RANK procedure

```
proc rank data=have out=want groups=2;
var var1-var20;
ranks rvar1-rvar20;
run;
```

Below and above median will be identified with group number 0 and 1, respectively.

PG

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

09-11-2016 09:51 PM

So which part is your question?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

09-11-2016 10:12 PM

Hi Reeza,

for each column I would like to calculate the median of the numbers, and then for each colum create a new column which will consist of dummy values (0 or 1). So basically there are 2 subquestions: calculating the median of a column and creating a new column of dummy variables.

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

09-11-2016 10:23 PM

Here I would like to add some illustration:

suppose the initial data is something like this:

age |

25 |

29 |

31 |

35 |

Then I would like the final data to be of the following:

age | age_dummy |

25 | 0 |

29 | 0 |

31 | 1 |

35 | 1 |

And please note that I have over 20 variables so if possible to make the entire process automatic for all the variables.

Thanks!

Solution

09-11-2016
11:59 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

09-11-2016 10:25 PM - edited 09-11-2016 10:27 PM

Use the RANK procedure

```
proc rank data=have out=want groups=2;
var var1-var20;
ranks rvar1-rvar20;
run;
```

Below and above median will be identified with group number 0 and 1, respectively.

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

09-11-2016 11:00 PM

Hi PGStats,

thank you for the reply, I clearly got the dummy values depending on the median.

now I get the dummy columns but the original columsn disappear, and I need the original columns and I can't just merge becasue both the original columns and their corresponding new dummy columns have the same name.

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

09-11-2016 11:23 PM

If you used the RANK statement that's not what should have happened.

Please post your code.

Also verify that the variable name and labels are different.

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473611.htm

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

09-11-2016 11:58 PM

That's good now I realized that I forgot to put the "r" before the original varible name

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

09-11-2016 11:23 PM

Dataset **want** should contain original variables (**var1-var20**) __and__ new variables** rvar1-rvar20**, just as you need. I would need to see your log to understand what happened.

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

09-11-2016 11:58 PM

That's good now I realized that I forgot to put the "r" before the original varible name!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

09-12-2016 12:20 PM

I posted what I think is a more comprehensive 'R' based solution. This solution should map easly to IML.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to rogerjdeangelis

09-12-2016 10:08 PM

Hi rogerjdeangelis,

I would love to see the solution, but when I clicked on the link I was asked for a password, so is there another way of accessing your solution?

Thnak you!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ilikesas

09-22-2017 01:19 AM

Hello I am currently trying to create dummy variables for companies within countries.

This is my code below:

DATA WORK.IMPORT;

SET WORK.IMPORT;

IF COMPANY = COKE THEN COKED=1; ELSE COKED = 0;

IF COMPANY = KFC THEN KFCD=1; ELSE KFCD = 0;

IF COMPANY = PEPSI THEN PEPSID=1; ELSE PEPSID = 0;

IF COMPANY = NIKE THEN NIKED=1; ELSE NIKED = 0;

IF COMPANY = STARBUCKS THEN STARBUCKSD=1; ELSE STARBUCKSD = 0;

IF COMPANY = MCDONALD THEN MCDONALDD=1; ELSE MCDONALDD = 0;

IF COMPANY = HARDEES THEN HARDEESD=1; ELSE HARDEESD = 0;

RUN;

unfortunately the ouput for the dummy variables is all 0, so if the company is coke the output is also 0 and not 1

I don't understand what is wrong, please help.

Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to aalasady0

09-22-2017 01:40 AM - edited 09-22-2017 01:41 AM

@aalasady0, you would draw more attention if you posted your question as a new topic.

Try using statements like

COKED = upcase(COMPANY) = "COKE";

KFCD = upcase(COMPANY) = "KFC";

etc.

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

09-22-2017 01:44 AM

Thank you for your reply, I apologize for my ignorence but I am very new to SAS.

1.) What does this statement do?

2.) Where would this fit in my code?

Thanks