I am trying to filter outliers in my dataset. Specifically, I want to filter values outside the interval [0.5th percentile, 99.5th percentile]. I use the following code:
PROC UNIVARIATE DATA = BDD;
VAR AMOUNT;
OUTPUT OUT = TEMP
PCTLPRE = P_ PCTLPTS = 0.5, 99.5;
QUIT;
DATA _NULL_;
SET TEMP;
CALL SYMPUT('LB', P_0_5);
CALL SYMPUT('UB', P_99_5);
RUN;
However, when I try to then filter values, I get an error since the macro variables are stored as string, not numeric values. How could I use them as numeric in later steps?
Your idea is basically sound, as can be proven with this:
proc univariate data=sashelp.baseball;
var natbat;
output
out=temp
pctlpre=P_
pctlpts = 0.5, 99.5
;
run;
data _null_;
set temp;
call symput('LB',P_0_5);
call symput('UB',P_99_5);
run;
data want;
set sashelp.baseball;
where natbat gt &lb. and natbat lt &ub.;
run;
which works as intended:
73 proc univariate data=sashelp.baseball; 74 var natbat; 75 output 76 out=temp 77 pctlpre=P_ 78 pctlpts = 0.5, 99.5 79 ; 80 run; NOTE: The data set WORK.TEMP has 1 observations and 2 variables. NOTE: Verwendet wurde: PROZEDUR UNIVARIATE - (Gesamtverarbeitungszeit): real time 0.10 seconds cpu time 0.10 seconds 81 82 data _null_; 83 set temp; 84 call symput('LB',P_0_5); 85 call symput('UB',P_99_5); 86 run; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 84:18 85:18 NOTE: There were 1 observations read from the data set WORK.TEMP. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.00 seconds 87 88 data want; 89 set sashelp.baseball; 90 where natbat gt &lb. and natbat lt &ub.; 91 run; NOTE: There were 318 observations read from the data set SASHELP.BASEBALL. WHERE (natbat>127 and natbat<680); NOTE: The data set WORK.WANT has 318 observations and 24 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.01 seconds
Please post the complete log of the step in which you use the macro variables and get the ERROR.
Use this button to open a window into which you copy/paste the log text:
Your idea is basically sound, as can be proven with this:
proc univariate data=sashelp.baseball;
var natbat;
output
out=temp
pctlpre=P_
pctlpts = 0.5, 99.5
;
run;
data _null_;
set temp;
call symput('LB',P_0_5);
call symput('UB',P_99_5);
run;
data want;
set sashelp.baseball;
where natbat gt &lb. and natbat lt &ub.;
run;
which works as intended:
73 proc univariate data=sashelp.baseball; 74 var natbat; 75 output 76 out=temp 77 pctlpre=P_ 78 pctlpts = 0.5, 99.5 79 ; 80 run; NOTE: The data set WORK.TEMP has 1 observations and 2 variables. NOTE: Verwendet wurde: PROZEDUR UNIVARIATE - (Gesamtverarbeitungszeit): real time 0.10 seconds cpu time 0.10 seconds 81 82 data _null_; 83 set temp; 84 call symput('LB',P_0_5); 85 call symput('UB',P_99_5); 86 run; NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 84:18 85:18 NOTE: There were 1 observations read from the data set WORK.TEMP. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.00 seconds 87 88 data want; 89 set sashelp.baseball; 90 where natbat gt &lb. and natbat lt &ub.; 91 run; NOTE: There were 318 observations read from the data set SASHELP.BASEBALL. WHERE (natbat>127 and natbat<680); NOTE: The data set WORK.WANT has 318 observations and 24 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.01 seconds
Thanks for the reply! Actually, I just didn't realize SAS would interpret the macrovariables as numeric when using them in logical conditions. I'm new around here, if I need to delete the post just let me know!
You are correct in saying that macro variables are text, because the macro processor is a text engine. But that text is used as program text, just as if you yourself typed it. If macro variables can be used in a certain place depends on the contents of the macro variables and the context.
In your case, the macro variables contain text with a numeric value, and can be used in a place where numbers are required
While symput will work, symputx is the better choice.
Could you elaborate?
CALL SYMPUTX will do the conversion from numeric to character on its own, thereby avoiding the conversion NOTE in the log, and it allows you to control (through a third optional parameter) into which symbol table (local or global) a macro variable is stored.
In some cases the textual representation of a numeric value (i.e. default is a display format BEST12.) used by SYMPUTX that gets assigned to a macro variable is not EXACTLY the numeric value you want to use later on when the macro variable is resolved.
Remedies
Example (macro)
data have; z = 12345.99999999; call symputx ('z_mvar_rep1', z); * rep means representation; call symput ('z_mvar_rep2', put(z, hex16.)); run; %put NOTE: &=z_mvar_rep1; %put NOTE: &=z_mvar_rep2; data later; set have; diff1 = z - &z_mvar_rep1; diff2 = z - input("&z_mvar_rep2", hex16.); run;
Log
NOTE: Z_MVAR_REP1=12346 NOTE: Z_MVAR_REP2=40C81CFFFFFFEA86
Example (set)
data later; set have; if _n_ = 1 then set have (rename=z=z_earlier); format z_earlier best20.; diff = z - z_earlier; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.