BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ilikesas
Barite | Level 11

Hi,

 

I have a data with over 20 variables and over a 100 rows of data.

 

For each variable I would like to create a dummy variable which depends on the median of the values of that variable. Here is an example to illustrate:

 

I have a column named "age" containing ages. I would like to calculate the median age, and suppose it equals 29, so if for an entry the value for age is greater than the median, say 35, then the new variable "age_dummy" will be equal to 1, and if age is less than or equal to the median, like 25, than the dummy will be equal to 0.

 

so if the initial data is something like this:

age
25
29
31
35

 

Then I would like the final data to be of the following:

age age_dummy
25 0
29 0
31 1
35 1

 

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Use the RANK procedure

 

proc rank data=have out=want groups=2;
var var1-var20;
ranks rvar1-rvar20;
run;

Below and above median will be identified with group number 0 and 1, respectively.

PG

View solution in original post

17 REPLIES 17
Reeza
Super User

So which part is your question? 

ilikesas
Barite | Level 11

Hi Reeza,

 

for each column I would like to calculate the median of the numbers, and then for each colum create a new column which will consist of dummy values (0 or 1). So basically there are 2 subquestions: calculating the median of a column and creating a new column of dummy variables.

 

Thanks!

ilikesas
Barite | Level 11

Here I would like to add some illustration:

 

suppose the initial data is something like this:

age
25
29
31
35

 

Then I would like the final data to be of the following:

age age_dummy
25 0
29 0
31 1
35 1

 

And please note that I have over 20 variables so if possible to make the entire process automatic for all the variables.

 

Thanks!

PGStats
Opal | Level 21

Use the RANK procedure

 

proc rank data=have out=want groups=2;
var var1-var20;
ranks rvar1-rvar20;
run;

Below and above median will be identified with group number 0 and 1, respectively.

PG
ilikesas
Barite | Level 11

Hi PGStats,

 

thank you for the reply, I clearly got the dummy values depending on the median.

 

now I get the dummy columns but the original columsn disappear, and I need the original columns and I can't just merge becasue both the original columns and their corresponding new dummy columns have the same name.

 

Thanks!

Reeza
Super User

If you used the RANK statement that's not what should have happened. 

 

Please post your code. 

Also verify that the variable name and labels are different. 

 

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473611.htm

ilikesas
Barite | Level 11

That's good now I realized that I forgot to put the "r" before the original varible name 

PGStats
Opal | Level 21

Dataset want should contain original variables (var1-var20) and new variables rvar1-rvar20, just as you need. I would need to see your log to understand what happened.

PG
ilikesas
Barite | Level 11

That's good now I realized that I forgot to put the "r" before the original varible name!

rogerjdeangelis
Barite | Level 11

 

I posted what I think is a more comprehensive 'R' based solution. This solution should map easly to IML.

 

https://listserv.uga.edu/cgi-bin/wa?A2=SAS-L;a2483999.1609b

ilikesas
Barite | Level 11

Hi rogerjdeangelis,

 

I would love to see the solution, but when I clicked on the link I was asked for a password, so is there another way of accessing your solution?

 

Thnak you!

aalasady0
Calcite | Level 5

Hello I am currently trying to create dummy variables for companies within countries.

 

This is my code below:

 

DATA WORK.IMPORT;
SET WORK.IMPORT;
IF COMPANY = COKE THEN COKED=1; ELSE COKED = 0;
IF COMPANY = KFC THEN KFCD=1; ELSE KFCD = 0;
IF COMPANY = PEPSI THEN PEPSID=1; ELSE PEPSID = 0;
IF COMPANY = NIKE THEN NIKED=1; ELSE NIKED = 0;
IF COMPANY = STARBUCKS THEN STARBUCKSD=1; ELSE STARBUCKSD = 0;
IF COMPANY = MCDONALD THEN MCDONALDD=1; ELSE MCDONALDD = 0;
IF COMPANY = HARDEES THEN HARDEESD=1; ELSE HARDEESD = 0;
RUN;

 

unfortunately the ouput for the dummy variables is all 0, so if the company is coke the output is also 0 and not 1

 

I don't understand what is wrong, please help.

 

Thanks

 

PGStats
Opal | Level 21

@aalasady0, you would draw more attention if you posted your question as a new topic.

 

Try using statements like

 

COKED = upcase(COMPANY) = "COKE";

KFCD = upcase(COMPANY) = "KFC";

 

etc.

 

 

PG
aalasady0
Calcite | Level 5

Thank you for your reply, I apologize for my ignorence but I am very new to SAS.

 

1.) What does this statement do?

 

2.) Where would this fit in my code?

 

Thanks

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 6596 views
  • 3 likes
  • 5 in conversation