BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
93alejandrosanc
Fluorite | Level 6

I am trying to filter outliers in my dataset. Specifically, I want to filter values outside the interval [0.5th percentile, 99.5th percentile]. I use the following code:

 

PROC UNIVARIATE DATA = BDD;

    VAR AMOUNT;

    OUTPUT OUT = TEMP

    PCTLPRE = P_ PCTLPTS = 0.5, 99.5;

QUIT;

DATA _NULL_;

    SET TEMP;

    CALL SYMPUT('LB', P_0_5);

    CALL SYMPUT('UB', P_99_5);

RUN;

 

However, when I try to then filter values, I get an error since the macro variables are stored as string, not numeric values. How could I use them as numeric in later steps?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Your idea is basically sound, as can be proven with this:

proc univariate data=sashelp.baseball;
var natbat;
output
  out=temp
  pctlpre=P_ 
  pctlpts = 0.5, 99.5
;
run;

data _null_;
set temp;
call symput('LB',P_0_5);
call symput('UB',P_99_5);
run;

data want;
set sashelp.baseball;
where natbat gt &lb. and natbat lt &ub.;
run;

which works as intended:

 73         proc univariate data=sashelp.baseball;
 74         var natbat;
 75         output
 76           out=temp
 77           pctlpre=P_
 78           pctlpts = 0.5, 99.5
 79         ;
 80         run;
 
 NOTE: The data set WORK.TEMP has 1 observations and 2 variables.
 NOTE:  Verwendet wurde: PROZEDUR UNIVARIATE - (Gesamtverarbeitungszeit):
       real time           0.10 seconds
       cpu time            0.10 seconds
       
 
 81         
 82         data _null_;
 83         set temp;
 84         call symput('LB',P_0_5);
 85         call symput('UB',P_99_5);
 86         run;
 
 NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
       84:18   85:18   
 NOTE: There were 1 observations read from the data set WORK.TEMP.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 
 87         
 88         data want;
 89         set sashelp.baseball;
 90         where natbat gt &lb. and natbat lt &ub.;
 91         run;
 
 NOTE: There were 318 observations read from the data set SASHELP.BASEBALL.
       WHERE (natbat>127 and natbat<680);
 NOTE: The data set WORK.WANT has 318 observations and 24 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.01 seconds

View solution in original post

9 REPLIES 9
Kurt_Bremser
Super User

Please post the complete log of the step in which you use the macro variables and get the ERROR.

Use this button to open a window into which you copy/paste the log text:

Bildschirmfoto 2020-04-07 um 08.32.59.jpg

Kurt_Bremser
Super User

Your idea is basically sound, as can be proven with this:

proc univariate data=sashelp.baseball;
var natbat;
output
  out=temp
  pctlpre=P_ 
  pctlpts = 0.5, 99.5
;
run;

data _null_;
set temp;
call symput('LB',P_0_5);
call symput('UB',P_99_5);
run;

data want;
set sashelp.baseball;
where natbat gt &lb. and natbat lt &ub.;
run;

which works as intended:

 73         proc univariate data=sashelp.baseball;
 74         var natbat;
 75         output
 76           out=temp
 77           pctlpre=P_
 78           pctlpts = 0.5, 99.5
 79         ;
 80         run;
 
 NOTE: The data set WORK.TEMP has 1 observations and 2 variables.
 NOTE:  Verwendet wurde: PROZEDUR UNIVARIATE - (Gesamtverarbeitungszeit):
       real time           0.10 seconds
       cpu time            0.10 seconds
       
 
 81         
 82         data _null_;
 83         set temp;
 84         call symput('LB',P_0_5);
 85         call symput('UB',P_99_5);
 86         run;
 
 NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
       84:18   85:18   
 NOTE: There were 1 observations read from the data set WORK.TEMP.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 
 87         
 88         data want;
 89         set sashelp.baseball;
 90         where natbat gt &lb. and natbat lt &ub.;
 91         run;
 
 NOTE: There were 318 observations read from the data set SASHELP.BASEBALL.
       WHERE (natbat>127 and natbat<680);
 NOTE: The data set WORK.WANT has 318 observations and 24 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.01 seconds
93alejandrosanc
Fluorite | Level 6

Thanks for the reply! Actually, I just didn't realize SAS would interpret the macrovariables as numeric when using them in logical conditions. I'm new around here, if I need to delete the post just let me know!

Kurt_Bremser
Super User

You are correct in saying that macro variables are text, because the macro processor is a text engine. But that text is used as program text, just as if you yourself typed it. If macro variables can be used in a certain place depends on the contents of the macro variables and the context.

In your case, the macro variables contain text with a numeric value, and can be used in a place where numbers are required

WarrenKuhfeld
Ammonite | Level 13

While symput will work, symputx is the better choice.

93alejandrosanc
Fluorite | Level 6

Could you elaborate?

WarrenKuhfeld
Ammonite | Level 13
google symputx
Kurt_Bremser
Super User

CALL SYMPUTX will do the conversion from numeric to character on its own, thereby avoiding the conversion NOTE in the log, and it allows you to control (through a third optional parameter) into which symbol table (local or global) a macro variable is stored.

RichardDeVen
Barite | Level 11

In some cases the textual representation of a numeric value (i.e. default is a display format BEST12.) used by SYMPUTX that gets assigned to a macro variable is not EXACTLY the numeric value you want to use later on when the macro variable is resolved.

 

Remedies

  • Use format HEX16. to place an exact textual representation of the numeric value in a macro variable
  • Use a merge or set to introduce the variable(s) to later DATA Step processing

Example (macro)

data have;
  z = 12345.99999999;
  call symputx ('z_mvar_rep1', z);  * rep means representation;
  call symput  ('z_mvar_rep2', put(z, hex16.));
run;

%put NOTE: &=z_mvar_rep1;
%put NOTE: &=z_mvar_rep2;

data later;
  set have;
  diff1 = z - &z_mvar_rep1;
  diff2 = z - input("&z_mvar_rep2", hex16.);
run;

Log

NOTE: Z_MVAR_REP1=12346
NOTE: Z_MVAR_REP2=40C81CFFFFFFEA86

Example (set)

data later;
  set have;
  if _n_ = 1 then set have (rename=z=z_earlier);

  format z_earlier best20.;

  diff = z - z_earlier;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 3209 views
  • 8 likes
  • 4 in conversation