Dear all,
I have created a macro variable for over 200 metabolites &biomarker.
I have standardized &biomarker variable and by running proc univariate I can see that half of the biomarkers have outliers.
I tried this code to set the outliers as missing values:
data met; set met;
%if &biomarker<-5 %then %let &biomarker=.;
%else %if &biomarker>5 %then %let &biomarker=.;
run;
I got these error messages
ERROR: The %IF statement is not valid in open code.
ERROR: The %ELSE statement is not valid in open code.
Even though I saw the posts saying that it was possible to run these statements in the open code.
Then I tried to run it within a macro
%macro clean;
data met; set met;
%if &biomarker<-5 %then %let &biomarker=.;
%else %if &biomarker>5 %then %let &biomarker=.;
run;
%mend clean;
%clean
ERROR: Expecting a variable name after %LET.
ERROR: Symbolic variable name XXLVLDLP XXLVLDLL XXLVLDLPL XXLVL must be 32 or fewer characters
long.
ERROR: The macro CLEAN will stop executing.
Then I tried to condition with just one condition at the time
%macro clean;
data met; set met;
%if &biomarker<-5 %then %let &biomarker=.; run;
%mend clean;
I did not get any error messages. But when I run proc univariate with &biomarker I can see that all values below -5 are still in the data,
Please help me to modify my code to execute data cleaning.
Thank you very much in advance!
My bad. I wrote the code wrong. THis should fix it
data want;
set have;
array x XXLVLDLP XXLVLDLL XXLVLDLPL XXLVLDLC XXLVLDLCE
XXLVLDLFC XXLVLDLTG XLVLDLP XLVLDLL XLVLDLPL XLVLDLC;
do i=1 to dim(x);
if x(i)>5 or x<(i)-5 then x(i)=.;
end;
drop i;
run;
DATA steps do not run macro language %IF %THEN statements.
You can still refer to a macro variable using &biomarker, but get rid of all the percent signs.
In addition, what does &biomarker resolve to? That will be necessary to come up with right DATA step logic to replace what you have now.
Thank you for your answer!
Running the code without % signs gives this error:
NOTE: Line generated by the macro variable "BIOMARKER".
1 XXLVLDLP XXLVLDLL XXLVLDLPL XXLVLDLC XXLVLDLCE XXLVLDLFC XXLVLDLTG XLVLDLP XLVLDLL
--------
388
76
1 ! XLVLDLPL XLVLDLC
ERROR 388-185: Expecting an arithmetic operator.
ERROR 76-322: Syntax error, statement will be ignored.
&biomarker is the set of 200+ numeric variables representing metabolites
Forget macro programming for the moment, learn to use the data step first.
What does
%put &biomarker.;
reveal in the log? Please post that log, using the {i} button.
How did you create the macro variable? You are MOST LIKELY better off by keeping that data in a dataset and manipulate it with data step means.
Hi, if macro if not the optimal solution to my problem, I would really appreciate any advice of what may be.
I will reduce macro variable to just a few items to make it easier.
This is how I created a macro variable:
%let biomarker = XXLVLDLP XXLVLDLL XXLVLDLPL XXLVLDLC XXLVLDLCE XXLVLDLFC XXLVLDLTG XLVLDLP XLVLDLL
XLVLDLPL XLVLDLC;
here is the log you requested
1651 %put &biomarker;
XXLVLDLP XXLVLDLL XXLVLDLPL XXLVLDLC XXLVLDLCE XXLVLDLFC XXLVLDLTG XLVLDLP XLVLDLL XLVLDLPL
XLVLDLC
data want;
set have;
array x XXLVLDLP XXLVLDLL XXLVLDLPL XXLVLDLC XXLVLDLCE
XXLVLDLFC XXLVLDLTG XLVLDLP XLVLDLL XLVLDLPL XLVLDLC;
do i=1 to dim(x);
if x>5 or x<-5 then x=.;
end;
drop i;
run;
Thank you so much!
Unfortunately, the applying this code creates the array which includes all of the variables in my dataset including id and covariates.
Here is the log:
2249 data new; set met;
2250 array x XXLVLDLP XXLVLDLL XXLVLDLPL XXLVLDLC XXLVLDLCE
2251 XXLVLDLFC XXLVLDLTG XLVLDLP XLVLDLL XLVLDLPL XLVLDLC;
2252 do i=1 to dim(x);
2253 if x>5 or x<-5 then x=.;
2254 end;
2255 drop i;
2256 run;
ERROR: Array subscript out of range at line 2253 column 12.
My bad. I wrote the code wrong. THis should fix it
data want;
set have;
array x XXLVLDLP XXLVLDLL XXLVLDLPL XXLVLDLC XXLVLDLCE
XXLVLDLFC XXLVLDLTG XLVLDLP XLVLDLL XLVLDLPL XLVLDLC;
do i=1 to dim(x);
if x(i)>5 or x<(i)-5 then x(i)=.;
end;
drop i;
run;
@PaigeMiller forgot to add the array references in the code:
data want;
set have;
array x XXLVLDLP XXLVLDLL XXLVLDLPL XXLVLDLC XXLVLDLCE
XXLVLDLFC XXLVLDLTG XLVLDLP XLVLDLL XLVLDLPL XLVLDLC;
do i=1 to dim(x);
if x{i}>5 or x{i}<-5 then x{i}=.;
end;
drop i;
run;
Dear @Kurt_Bremser and @PaigeMiller ,
Thank you so much! The code worked and my problem is solved!
Thank you so very much again!
And you can now use the macro variable to define the array:
array x {*} &biomarker.;
Thank you so much, @Kurt_Bremser !
@Polina_UH wrote:
I got these error messages
ERROR: The %IF statement is not valid in open code.
ERROR: The %ELSE statement is not valid in open code.
Even though I saw the posts saying that it was possible to run these statements in the open code.
These macro commands can be run in open code starting in SAS 9.4 Maintenance Release 5, but not in earlier versions of SAS.
%macro clean;
data met; set met;
%if &biomarker<-5 %then %let &biomarker=.;
%else %if &biomarker>5 %then %let &biomarker=.;
run;
%mend clean;%clean
ERROR: Expecting a variable name after %LET.
The syntax of the %LET command usually does not include an ampersand (although it can include an ampersand, I don't believe you want to use the & in this situation).
But as stated by @Astounding this probably is not the proper way to do things.
Taking a big step backwards, if you are trying to eliminate outliers, I'm not seeing a need for macros at all. If you standardize each of your 200 variables somehow (it seems like you did this part already) and then you want any value of the standardized variable less then -5 or greater than 5 to be set to missing, this is an example where ARRAY statements work perfectly, and then no macros (and so no complication from macros) is needed.
Hi, thank you for you answer.
I standardized them just by using this code and it worked:
proc standard data=met mean=0 std=1 out=met;
var &biomarker;
run;
I'm not sure how to create an array: my variables have very different names and no consecutive numbers...
@Polina_UH wrote:
Hi, thank you for you answer.
I standardized them just by using this code and it worked:
proc standard data=met mean=0 std=1 out=met;
var &biomarker;
run;
I'm not sure how to create an array: my variables have very different names and no consecutive numbers...
I added array code to Message #7 in this thread.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.