BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

For each custID+VAR I calculated UCL and LCL (control limits).

Then I want to calculate for each CustID+VAR the average without extreme values

Note-Extreme values are values lower LCL or higher UCL

What is the best way to do it

 


Data have;
input CustID mon X Y Z;
cards;
111 6 10 20 300 10 80
111 5 11 12 13 2 15
111 4 14 15 16 7 20
111 3 17 18 19 6 25
111 2 20 21 22 5 25
111 1 23 24 25 4 30
222 6 40 41 80 10 50 
222 5 37 38 39 12 50
222 4 15 13 12 2 20
222 3 12 14 16 5 20
222 2 50 30 20 5 40
222 1 15 16 40 3 25
;
Run;
proc sort data=have;by CustID mon ;Run; 
proc transpose data=have out=have2(Rename=(_NAME_=VAR  COL1=value));
by CustID mon ;
var X Y Z;
quit;
proc summary data=have2 nway;
class CustID var;
var VALUE;
output out=summary (drop=_type_ _freq_ rename=(CV=CV6 mean=AVG6 N=N6 std=std6 max=max6 min=min6)) cv=CV mean= mean n=n std=std min=min max=max;
run;
data summary2;
set summary;
UCL=AVG6+std6;
LCL=AVG6-std6;
Run;
 
Data have;
input CustID VAR $ AVG_no_EXT;
cards;
This is the requyired data set

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

ADVICE: do NOT transpose long data sets. This is a really bad habit you have. Long data sets are already in the proper layout for analyzing.

 

proc summary data=have nway;
    class custid;
    var x y z;
    output out=stats mean= std=/autoname;
run;
data want; /* Set extreme values to missing */
    merge have stats;
    by custid;
    array data x y z;
    array avg x_mean--z_mean;
    array stddev x_stddev--z_stddev;
    do i=1 to dim(data);
        if data(i) > (avg(i)+stddev(i)) or data(i) < (avg(i)-stddev(i)) then data(i)=.;
    end;
    keep custid mon x y z;
run;
proc means data=want;
    var x y z;
    output out=means_no_extremes mean=;
run;

 

 

Even simpler, use PROC STDIZE (which is particularly helpful if you really have 75 variables instead of 3) and PROC STDIZE works with long (untransposed) data sets.

 

proc stdize data=have out=stdized sprefix=s_ oprefix=o_;
    by custid;
    var x y z;
run;
data want;
    set stdized;
    array s_ s_:; /* Array of standardized values */
    array o o_:; /* Array of original data */
     /* Determine which standardized values are beyond ±1, make the data missing in that case */
    do i=1 to dim(data); 
        if abs(s_(i)) > 1 then o(i)=.;
    end;
run;
proc summary data=want;
    by custid;
    var o_:;
    output out=means_no_extremes mean=;
run; 

 

Note 1: PROC STDIZE is clearly less programming than the first approach

 

Note 2: you do not have confidence limits or control limits, you are adding one standard deviation to the mean to get the upper limit, and subtracting one standarad deviation from the mean to get the lower limit. Do not call these confidence limits or control limits, they are not. You can call these upper and lower limits for the purpose of detecting outliers, if that's what you want. Even so, they are quite narrow for the purpose of detecting "extreme values", but how wide they should be is up to you.

--
Paige Miller

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

ADVICE: do NOT transpose long data sets. This is a really bad habit you have. Long data sets are already in the proper layout for analyzing.

 

proc summary data=have nway;
    class custid;
    var x y z;
    output out=stats mean= std=/autoname;
run;
data want; /* Set extreme values to missing */
    merge have stats;
    by custid;
    array data x y z;
    array avg x_mean--z_mean;
    array stddev x_stddev--z_stddev;
    do i=1 to dim(data);
        if data(i) > (avg(i)+stddev(i)) or data(i) < (avg(i)-stddev(i)) then data(i)=.;
    end;
    keep custid mon x y z;
run;
proc means data=want;
    var x y z;
    output out=means_no_extremes mean=;
run;

 

 

Even simpler, use PROC STDIZE (which is particularly helpful if you really have 75 variables instead of 3) and PROC STDIZE works with long (untransposed) data sets.

 

proc stdize data=have out=stdized sprefix=s_ oprefix=o_;
    by custid;
    var x y z;
run;
data want;
    set stdized;
    array s_ s_:; /* Array of standardized values */
    array o o_:; /* Array of original data */
     /* Determine which standardized values are beyond ±1, make the data missing in that case */
    do i=1 to dim(data); 
        if abs(s_(i)) > 1 then o(i)=.;
    end;
run;
proc summary data=want;
    by custid;
    var o_:;
    output out=means_no_extremes mean=;
run; 

 

Note 1: PROC STDIZE is clearly less programming than the first approach

 

Note 2: you do not have confidence limits or control limits, you are adding one standard deviation to the mean to get the upper limit, and subtracting one standarad deviation from the mean to get the lower limit. Do not call these confidence limits or control limits, they are not. You can call these upper and lower limits for the purpose of detecting outliers, if that's what you want. Even so, they are quite narrow for the purpose of detecting "extreme values", but how wide they should be is up to you.

--
Paige Miller
Ronein
Meteorite | Level 14

It is perfect, however final Avg without extreme should be done for each customer 

proc means data=want;
by CustID;
    var x y z;
    output out=means_no_extremes mean=;
run;
PaigeMiller
Diamond | Level 26

Ok, good catch, and you were able to fix it. Actually, for the PROC STDIZE solution I did include BY CUSTID.

--
Paige Miller
Ksharp
Super User
You could use "winsorized" way by
proc univariate data=&dsname winsorized=0.1;
ods output WinsorizedMeans=winMeans;
run;

Check @Rick_SAS blogs:
https://blogs.sas.com/content/iml/2015/07/15/winsorize-data.html
https://blogs.sas.com/content/iml/2017/02/08/winsorization-good-bad-and-ugly.html

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 233 views
  • 2 likes
  • 3 in conversation