I have a dataset A that contains some 600+ variables.
I have another dataset B which contains variable name and corresponding percentile values. I want to check say 90percentile value from this dataset B for a variable and remove all observations from dataset A which are greater than that value for same variable. Basically outlier removal.
But doing this manually for so many variables using if condition or doing macro call for 600 variables will be very time consuming.
Is there any efficient way of doing this?
HI,
Well, guessing as no test data (in form of dataset) or output is provided:
merge dataaset B to A based on key identifiers.
then do something like:
data want;
set have;
array all_my_vars{600}; /* Assumes they all have prefix of all_my_vars */
do i=1 to 600;
if all_my_vars{i} > percentile_merged_on_from_b then all_my_vars{i}=.;
end;
run;
the problem is all my variables have customized names. and for merging based on variable names, i believe require transpose?
Yes, you will have to normalise your data then. In all fairness, unless there is a specific reason to have 600+ variables (which is very rare), its best to keep them normalised anyways. Something like:
proc transpose data=have out=want;
by <id_vars>;
var var1 varx vat...;
run;
If there is too many to type by hand, then you can use the to modifier if your on later SAS versions:
var var1-vat;
Check IML code:
https://communities.sas.com/message/282057#282057
data have;
call streaminit(1234);
do i=1 to 100;
a=ceil(rand('uniform')*100);
b=ceil(rand('uniform')*100);
output;
end;
drop i;
run;
%let high=0.9;
proc iml;
use have;
read all var _num_ into x[c=vname];
close have;
call qntl(q,x,{&high});
do i=1 to ncol(x);
x[loc(x[,i]>q[i]),i]=.;
end;
create want from x[c=vname];
append from x;
close want;
quit;
Thanks Xia, your inputs have always been great. But IML comes as separate package for SAS and we are using enterprise guide.
If you have IML licensed (on the server) then you can run such code out of SAS Enterprise Guide (the client). The client sends code to the server for execution.
NO, they don't have IML.
How about download SAS University Edition at sas.com . It is totally free.
I still remember Reeza write a macro to winsorize a data set . Maybe you can call her to change her code to suit your requirement?
Search : winsorize + Reeza at this forum .
Xia Keshan
I tried creating all the values in macro variable for 600 variable names and 600 values and tried running the do loop:
DATA LIB.SUB_FINAL1;
SET LIB.SUB_FINAL;
%do i=1 %to 600 %by 1;
%IF &&VARNAME&I=. %THEN &&VARNAME&I=&&PERC_MEDN&I;
%end;
RUN;
Thought values are resolving correctly, but its not doing, what it is intended to do.
e.g. log varname344 resolved to amount_ex
%IF condition &&VARNAME&I=. is FALSE
now amount_ex is present in the dataset, but why it is saying false? does it run only over one row? do i need to use some other counter like j=1 to 80000 to run over all rows?
It is interesting, i removed the % from if condition and now, it seems to be working fine within macro. I also created 600 macro values each for upper and lower percentile and used the same condition, so basically it writes those many if statements with loop.
Here's the macro that Xia was referring to, depending on how big your dataset is this may not be efficient, but its simple to understand and limits the number of macro variables so it's easy to debug.
Use dataset B to generate the code to subset dataset A.
filename code temp;
data _null_;
set b;
put 'if ' var '>' value 'then delete;' ;
run;
data want ;
set A;
%include code / source2 ;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.