Hi,
I have a data with over 20 variables and over a 100 rows of data.
For each variable I would like to create a dummy variable which depends on the median of the values of that variable. Here is an example to illustrate:
I have a column named "age" containing ages. I would like to calculate the median age, and suppose it equals 29, so if for an entry the value for age is greater than the median, say 35, then the new variable "age_dummy" will be equal to 1, and if age is less than or equal to the median, like 25, than the dummy will be equal to 0.
so if the initial data is something like this:
age |
25 |
29 |
31 |
35 |
Then I would like the final data to be of the following:
age | age_dummy |
25 | 0 |
29 | 0 |
31 | 1 |
35 | 1 |
Thank you!
Use the RANK procedure
proc rank data=have out=want groups=2;
var var1-var20;
ranks rvar1-rvar20;
run;
Below and above median will be identified with group number 0 and 1, respectively.
So which part is your question?
Hi Reeza,
for each column I would like to calculate the median of the numbers, and then for each colum create a new column which will consist of dummy values (0 or 1). So basically there are 2 subquestions: calculating the median of a column and creating a new column of dummy variables.
Thanks!
Here I would like to add some illustration:
suppose the initial data is something like this:
age |
25 |
29 |
31 |
35 |
Then I would like the final data to be of the following:
age | age_dummy |
25 | 0 |
29 | 0 |
31 | 1 |
35 | 1 |
And please note that I have over 20 variables so if possible to make the entire process automatic for all the variables.
Thanks!
Use the RANK procedure
proc rank data=have out=want groups=2;
var var1-var20;
ranks rvar1-rvar20;
run;
Below and above median will be identified with group number 0 and 1, respectively.
Hi PGStats,
thank you for the reply, I clearly got the dummy values depending on the median.
now I get the dummy columns but the original columsn disappear, and I need the original columns and I can't just merge becasue both the original columns and their corresponding new dummy columns have the same name.
Thanks!
If you used the RANK statement that's not what should have happened.
Please post your code.
Also verify that the variable name and labels are different.
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473611.htm
That's good now I realized that I forgot to put the "r" before the original varible name
Dataset want should contain original variables (var1-var20) and new variables rvar1-rvar20, just as you need. I would need to see your log to understand what happened.
That's good now I realized that I forgot to put the "r" before the original varible name!
I posted what I think is a more comprehensive 'R' based solution. This solution should map easly to IML.
Hi rogerjdeangelis,
I would love to see the solution, but when I clicked on the link I was asked for a password, so is there another way of accessing your solution?
Thnak you!
Hello I am currently trying to create dummy variables for companies within countries.
This is my code below:
DATA WORK.IMPORT;
SET WORK.IMPORT;
IF COMPANY = COKE THEN COKED=1; ELSE COKED = 0;
IF COMPANY = KFC THEN KFCD=1; ELSE KFCD = 0;
IF COMPANY = PEPSI THEN PEPSID=1; ELSE PEPSID = 0;
IF COMPANY = NIKE THEN NIKED=1; ELSE NIKED = 0;
IF COMPANY = STARBUCKS THEN STARBUCKSD=1; ELSE STARBUCKSD = 0;
IF COMPANY = MCDONALD THEN MCDONALDD=1; ELSE MCDONALDD = 0;
IF COMPANY = HARDEES THEN HARDEESD=1; ELSE HARDEESD = 0;
RUN;
unfortunately the ouput for the dummy variables is all 0, so if the company is coke the output is also 0 and not 1
I don't understand what is wrong, please help.
Thanks
@aalasady0, you would draw more attention if you posted your question as a new topic.
Try using statements like
COKED = upcase(COMPANY) = "COKE";
KFCD = upcase(COMPANY) = "KFC";
etc.
Thank you for your reply, I apologize for my ignorence but I am very new to SAS.
1.) What does this statement do?
2.) Where would this fit in my code?
Thanks
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.