BookmarkSubscribeRSS Feed
axescot78
Quartz | Level 8

I have a data set with both continuous and categorical variables. I need to find extreme values and replace them as missing values for the continuous variables. I've gotten this far:

 

/* Calculate Median and IQR */
PROC UNIVARIATE DATA = kddcup98 NOPRINT;
VAR DemAge
DemMedHomeValue
DemMedIncome
DemPctVeterans
GiftAvg36
GiftAvgAll
GiftAvgCard36
GiftAvgLast
GiftCnt36
GiftCntAll
GiftCntCard36
GiftCntCardAll
GiftTimeFirst
GiftTimeLast
PromCnt12
PromCnt36
PromCntAll
PromCntCard12
PromCntCard36
PromCntCardAll
TARGET_D;
OUTPUT OUT = boxStats p25 = p25 p75 = p75 QRANGE = iqr;
RUN;

DATA _null_;
SET boxStats;
CALL symput ('p25',p25);
CALL symput ('p75',p75);
CALL symput ('iqr', iqr);
RUN;

%PUT &p25;
%PUT &p75;
%PUT &iqr;

DATA trimmed;
SET kddcup98;
ARRAY change _numeric_;
DO OVER change;
IF (change > &p75 + 1.5 * &iqr) OR (change < &p25 - 1.5 * &iqr) THEN change = .;
END;
RUN;

/* List Variables with Missing Values */
PROC MEANS DATA=trimmed NMISS N;
TITLE 'trimmed Variables with Number of Missing Values (NMISS) and Number of Numeric Values (N)';
RUN;

 

The only problem is that is miscalculates the number of extreme values. In some cases, it considers most of the values as extreme.

1 REPLY 1
Rick_SAS
SAS Super FREQ

Are you trying to trim or Winsorize each variable? If so, please read "Winsorization: The good, the bad, and the ugly," which discusses the statistical implications of getting rid of extreme values. If you decide to proceed and Winsorize your data, the article also contains links to a second article about how to Winsorize, and you can easily modify it to replace extreme values with missing values.

 

If you only want the trimmed or Winsorized means and StdDev, you can use the ROBUSTSCALE option, the TRIMMED= option, and the WINSORIZED= option to obtain robust estimates without modifying the original data. 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 826 views
  • 0 likes
  • 2 in conversation