BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RobertWang
Calcite | Level 5

Hello,

I have a dataset which consists of over 100 continuous variables.

What I want to do is to creat corresponding dummy variables in the same dataset according to their medians.

Because of the large number of variables, I'd like to do it by means of %macro.

In the attached, I first input a dataset as an example and provide some codes (syntax) that I've tried. 

I have basic ability in programming, but not in coding macro syntax at all.

Could you please do me a favor?  I apprecite your efforts and help.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

@RobertWang

 

I don't think you need a macro.  What you really need is

  1. A list of variables to pass through proc means processing (i.e.  VARS TGFB2--COH1_208834; in your case - NOTE THE DOUBLE DASH)
  2. Tell the proc means to generate medians and assign them to a corresponding list of names. 
  3. Run a data step comparing the original list of variables to the corresponding list of medians, generating yet another corresponding list of dummy variables.

Here's an example using sashelp.class:

 

proc means data=sashelp.class noprint ;
  vars age -numeric- weight;
  output out=meds (drop=_type_ _freq_) median= p50= / autoname;
run;

data want;
  set sashelp.class ;
  if _n_=1 then set meds;
  array vars {*} age--weight ;
  array meds {*} age_median--weight_median;
  array p50  {*} age_p50--weight_p50;
  do i=1 to dim(vars);
    p50{i}=ifn(vars{i}=.,.,vars{i}>meds{i});
  end;
  drop age_median--weight_median i;
run;

 

  1. The vars statement tells proc means to generate stats on all the numeric variables from age (on the left) through weight (on the right).
  2. The output statement tells sas to generate two stats for each variable: median and percentile 50.  Yes, they are the same stat, but the use of the autoname option tells sas to name the new variables as  age_median age_height age_weight and age_p50 height_p50 and weight_p50.  This will be handy in the following data step.
  3. The data step defines three "synchronized" arrays.  Each element of the P50 array will be recalculated based on whether the original variable element is greater than its corresponding median.
  4. Not the "if _n_=1 then set ..." statement is a way to read the median and p50's and not have them reset to missing in subsequent data step iterations.  I.e. they are "retained".
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

What I want to do is to create corresponding dummy variables in the same dataset according to their medians. 

 

What happens after you find the medians of each variable? What is the next step? How do you go from medians to dummy variables?

 

I doubt a macro is needed here, but the answer really depends on what you do with the medians.

--
Paige Miller
mkeintz
PROC Star

@RobertWang

 

I don't think you need a macro.  What you really need is

  1. A list of variables to pass through proc means processing (i.e.  VARS TGFB2--COH1_208834; in your case - NOTE THE DOUBLE DASH)
  2. Tell the proc means to generate medians and assign them to a corresponding list of names. 
  3. Run a data step comparing the original list of variables to the corresponding list of medians, generating yet another corresponding list of dummy variables.

Here's an example using sashelp.class:

 

proc means data=sashelp.class noprint ;
  vars age -numeric- weight;
  output out=meds (drop=_type_ _freq_) median= p50= / autoname;
run;

data want;
  set sashelp.class ;
  if _n_=1 then set meds;
  array vars {*} age--weight ;
  array meds {*} age_median--weight_median;
  array p50  {*} age_p50--weight_p50;
  do i=1 to dim(vars);
    p50{i}=ifn(vars{i}=.,.,vars{i}>meds{i});
  end;
  drop age_median--weight_median i;
run;

 

  1. The vars statement tells proc means to generate stats on all the numeric variables from age (on the left) through weight (on the right).
  2. The output statement tells sas to generate two stats for each variable: median and percentile 50.  Yes, they are the same stat, but the use of the autoname option tells sas to name the new variables as  age_median age_height age_weight and age_p50 height_p50 and weight_p50.  This will be handy in the following data step.
  3. The data step defines three "synchronized" arrays.  Each element of the P50 array will be recalculated based on whether the original variable element is greater than its corresponding median.
  4. Not the "if _n_=1 then set ..." statement is a way to read the median and p50's and not have them reset to missing in subsequent data step iterations.  I.e. they are "retained".
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
RobertWang
Calcite | Level 5

Thank  and  for your prompt responses.

I can't believe I can get help immediately.

The example provide by completely solved my question.

And it's so elegant. (although I still don't understand a part of the program, it works well.)

Thank yo so much for the kind help. 

Wish you a Merry Christmas & Happy New Year !!

Cheers  Smiley Very Happy

 

 

 

mkeintz
PROC Star

@RobertWang

 

Which part of my program that you don't understand yet?  I can edit my answer to clarify.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1431 views
  • 0 likes
  • 3 in conversation