I am wondering if I can use this code (please see below) for removing outliers from my data set. I am not familiar with macro at all. I tried this and it looks like it would work if I made modifications. Explaining the variables I used as a test: variable a: freq 6-41 (only outlier is 112) var b freq is 3-47 (outliers are 0 135 340 and 90750) The code below removes everything that is >16 for var a (var a freq is 6-16 in my outfile) and for var b it removes everything that is >47 and it actually also removes anything that is < 21 (my outfile shows freq 21-47 for var b) Probably has to do with the P25 and P75 below? How can I change the code to remove any outliers like 0 and anything that is higher than the max value for each var (which will vary considerably per var. I also have a lot more than only 2 vars). For example if var a max value is 41 anything higher than that would be removed. If the max for var b is 47 anything higher than that would be removed. Var c and d, if added to the code below, sas tells me it does not work: variable c freq 26.4 - 104, it has a lot of values with decimal places like 26.4 but also numbers without decimals. I would removed anything higher than the 104 though but I need the values with decimals to stay. Var d freq 11.7 - 400 but the decimals here are outliers...there should be no decimals in this var - freq should be only in the 100's and I have one random 11.7 that would need to be removed. The code below does not "like" the values with decimals. Is there a way to make changes in this code for what I need or should I do this in a completely different way? Thank you ! The code that I found and tried is below: %macro outliers(input=, var=, output= ); %let Q1=; %let Q3=; %let varL=; %let varH=; %let n=%sysfunc(countw(&var)); %do i= 1 %to &n; %let val = %scan(&var,&i); %let Q1 = &Q1 &val._P25; %let Q3 = &Q3 &val._P75; %let varL = &varL &val.L; %let varH = &varH &val.H; %end; /* Calculate the quartiles and inter-quartile range using proc univariate */ proc means data=&input nway noprint; var &var; output out=temp P25= P75= / autoname; run; /* Extract the upper and lower limits into macro variables */ data temp; set temp; ID = 1; array varb(&n) &Q1; array varc(&n) &Q3; array lower(&n) &varL; array upper(&n) &varH; do i = 1 to dim(varb); lower(i) = varb(i) - 3 * (varc(i) - varb(i)); upper(i) = varc(i) + 3 * (varc(i) - varb(i)); end; drop i _type_ _freq_; run; data temp1; set &input; ID = 1; run; data &output; merge temp1 temp; by ID; array var(&n) &var; array lower(&n) &varL; array upper(&n) &varH; do i = 1 to dim(var); if not missing(var(i)) then do; if var(i) >= lower(i) and var(i) <= upper(i); end; end; drop &Q1 &Q3 &varL &varH ID i; run; %mend; %outliers(input=abfile, var= a b, output= outabfile);
... View more