BookmarkSubscribeRSS Feed
munitech4u
Quartz | Level 8

I have a dataset A that contains some 600+ variables.

I have another dataset B which contains variable name and corresponding percentile values. I want to check say 90percentile value from this dataset B for a variable and remove all observations from dataset A which are greater than that value for same variable. Basically outlier removal.

But doing this manually for so many variables using if condition or doing macro call for 600 variables will be very time consuming.

Is there any efficient way of doing this?

12 REPLIES 12
RW9
Diamond | Level 26 RW9
Diamond | Level 26

HI,

Well, guessing as no test data (in form of dataset) or output is provided:

merge dataaset B to A based on key identifiers.

then do something like:

data want;

     set have;

     array all_my_vars{600};  /* Assumes they all have prefix of all_my_vars */

     do i=1 to 600;

          if all_my_vars{i} > percentile_merged_on_from_b then all_my_vars{i}=.;

     end;

run;

munitech4u
Quartz | Level 8

the problem is all my variables have customized names. and for merging based on variable names, i believe require transpose?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, you will have to normalise your data then.  In all fairness, unless there is a specific reason to have 600+ variables (which is very rare), its best to keep them normalised anyways.  Something like:

proc transpose data=have out=want;

     by <id_vars>;

     var  var1 varx vat...;

run;

If there is too many to type by hand, then you can use the to modifier if your on later SAS versions:

     var var1-vat;

Ksharp
Super User

Check IML code:

https://communities.sas.com/message/282057#282057

Code: Program

data have;
call streaminit(1234);
do i=1 to 100;
  a=ceil(rand('uniform')*100);
  b=ceil(rand('uniform')*100);
  output;
end;
drop i;
run;


%let high=0.9;

proc iml;
use have;
read all var _num_ into x[c=vname];
close have;

call qntl(q,x,{&high});
do i=1 to ncol(x);
x[loc(x[,i]>q[i]),i]=.;
end;

create want from x[c=vname];
append from x;
close want;
quit;


Xia Keshan

munitech4u
Quartz | Level 8

Thanks Xia, your inputs have always been great. But IML comes as separate package for SAS and we are using enterprise guide.

Patrick
Opal | Level 21

If you have IML licensed (on the server) then you can run such code out of SAS Enterprise Guide (the client). The client sends code to the server for execution.

Ksharp
Super User

How about download SAS University Edition at sas.com . It is totally free.

I still remember Reeza write a macro to winsorize a data set . Maybe you can call her to change her code to suit your requirement?

Search :  winsorize + Reeza  at this forum .

Xia Keshan

munitech4u
Quartz | Level 8

I tried creating all the values in macro variable for 600 variable names and 600 values and tried running the do loop:

DATA LIB.SUB_FINAL1;

SET LIB.SUB_FINAL;

  %do i=1 %to 600 %by 1;

  %IF &&VARNAME&I=. %THEN &&VARNAME&I=&&PERC_MEDN&I;

%end;

RUN;

Thought values are resolving correctly, but its not doing, what it is intended to do.

e.g. log varname344 resolved to amount_ex

%IF condition &&VARNAME&I=. is FALSE

now amount_ex is present in the dataset, but why it is saying false? does it run only over one row? do i need to use some other counter like j=1 to 80000 to run over all rows?

munitech4u
Quartz | Level 8

It is interesting, i removed the % from if condition and now, it seems to be working fine within macro. I also created 600 macro values each for upper and lower percentile and used the same condition, so basically it writes those many if statements with loop.

Reeza
Super User

Here's the macro that Xia was referring to, depending on how big your dataset is this may not be efficient, but its simple to understand and limits the number of macro variables so it's easy to debug.

SAS Box-Plot/Tukey Method of Capping Outliers · GitHub

Tom
Super User Tom
Super User

Use dataset B to generate the code to subset dataset A.

filename code temp;


data _null_;

  set b;

  put 'if ' var '>' value 'then delete;' ;

run;


data want ;

  set A;

  %include code / source2 ;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1444 views
  • 0 likes
  • 6 in conversation