BookmarkSubscribeRSS Feed
ehsanmath
Obsidian | Level 7

Hallo,

 

do anyone here know an efficient way of replacing (upper) extreme values with (95th) percentile.

I have a Dataset with around 600 variables and want to get rid of  extreme values of all 600 variables with 95th percentile without doing a lot of copy past work.

 

Any idea ?

 

thanks in advance

Ehsan

3 REPLIES 3
Rick_SAS
SAS Super FREQ

This is called "Winsorizing" the data.  When you Winsorize, you replace extreme values by a specified percentile. Usually Winsorization is a two-sided operation. For example, you replace small values by the 5th percentile and large values by the 95th percentile.

 

If your goal is to compute the Winsorized mean, this comes out of PROC UNIVARIATE automatically by using the WINSORIZED=0.05 option. 

 

If you absolutely insist on Winsorizing the data yourself, it can get a little tricky. Search this forum for "winsorize" and see comments by @FreelanceReinh. You can also read the article "How to Winsorize data in SAS", which explains some of the issues and offers a SAS/IML solution.

ehsanmath
Obsidian | Level 7

Hi guys,

 

Thanks a lot for your suggestions and help.

With your help i wrote the followoing code to solve my problem. I am just copy pasting my code here

--------------------------------START-------------------------------------------------

proc sql;
        select
            trim(name), trim(name)||'p' into :avl_selected_100K_spalten separated by ' ', :avl_selected_100K_spalten_P99 separated by ' '
        from
            dictionary.columns
        where
            libname = 'WORK'
            and
            memname = 'AVL_SELECTED_100K_PATCH'
            and
            upcase(name) not in('PERSON_ID');
    quit;

proc univariate data=AVL_SELECTED_100K_Patch noprint;
   var &avl_selected_100K_spalten.;
   output out=AVL_SELECTED_100K_P99 pctlpts  =99
                           pctlpre  = &avl_selected_100K_spalten.
                           pctlname = p;
run;

data AVL_SELECTED_100K_P99;
    set AVL_SELECTED_100K_P99;
    _TYPE_=0;
run;

PROC SORT
    DATA=AVL_SELECTED_100K_P99
    OUT=AVL_SELECTED_100K_P99_sort;
    BY _TYPE_;
RUN;

PROC TRANSPOSE DATA=AVL_SELECTED_100K_P99_sort
    OUT=Trans_AVL_SELECTED_100K_P99_sort
    PREFIX=Column
    NAME=Column_name
    LABEL=Label;
    BY _TYPE_;
    VAR &avl_selected_100K_spalten_P99.;
RUN;

data AVL_SELECTED_100K_P99 (drop=_TYPE_ Label rename=(column1=P99));
    set Trans_AVL_SELECTED_100K_P99_sort;
    Column_name=substrn(Column_name,1,length(Column_name)-1);
run;

proc sql ;
        select Column_name, P99 into :Column_nameP99 separated by ' ', :Column_nameP99_value separated by ' ' from AVL_SELECTED_100K_P99;
    quit;

%macro computePercentile();

    %let count = 0;
        %do %while (%qscan(&Column_nameP99.,%eval(&count + 1),%str( )) ne );
            %let count = %eval(&count + 1);
        %end;


    %do i = 1 %to &count;

    %let currentVar&i = %qscan(&Column_nameP99.,&i,%str( ));
    %let currentValue&i = %qscan(&Column_nameP99_value.,&i,%str( ));

    data AVL_SELECTED_100K_Patch;

        set AVL_SELECTED_100K_Patch;
 
        if &&currentVar&i.. gt &&currentValue&i.. then &&currentVar&i.. = &&currentValue&i..;
         
    run;
    %end;

%mend;

%computePercentile();

--------------------------------------END-----------------------------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 4355 views
  • 0 likes
  • 3 in conversation