BookmarkSubscribeRSS Feed
PeteCthulu
Fluorite | Level 6

I'd like to winsorize my data from a self-paced reading study. In particular, (in the sample data below) I'd like to make it so that any value in "responsetime" that is more than 2.5 times the standard deviation away from the mean for a given "region", then that too-high value will be replaced by 2.5 times the standard deviation plus the mean. For example, region 4 (with values 304, 510, 331, 474, 1347, 379, 1072, 854, 542, 547, 340, 410) has a mean of 592.5 and a standard deviation of 314.6. Therefore, any value for region 3 that is more than 1378.7 should be replaced by 1378.7. In this sample, none of the values would be replaced. Is there a relatively simple way to command SAS to do this for my whole data set?

 

I'm also uncertain whether this should be done by conditions and regions, or just by regions. My thinking is that it's more conservative to do it by regions only.

 

 

data reading2;

input subject

lines subject$ item $ region $ responsetime verbtype$ animate$;

40044304AlternatingAnimate
40045498AlternatingAnimate
40046340AlternatingAnimate
40047485AlternatingAnimate
400154510NonAlternatingInanimate
400155304NonAlternatingInanimate
400156371NonAlternatingInanimate
400157483NonAlternatingInanimate
400234331NonAlternatingInanimate
400235332NonAlternatingInanimate
400236332NonAlternatingInanimate
400237357NonAlternatingInanimate
40064474AlternatingAnimate
40065309AlternatingAnimate
40066335AlternatingAnimate
40067337AlternatingAnimate
4012641347NonAlternatingInanimate
401265514NonAlternatingInanimate
401266484NonAlternatingInanimate
401267527NonAlternatingInanimate
40154379AlternatingAnimate
40155378AlternatingAnimate
401561720AlternatingAnimate
40157399AlternatingAnimate
4011941072NonAlternatingAnimate
401195459NonAlternatingAnimate
401196679NonAlternatingAnimate
401197985NonAlternatingAnimate
401184854NonAlternatingInanimate
401185434NonAlternatingInanimate
401186959NonAlternatingInanimate
401187438NonAlternatingInanimate
401174542NonAlternatingAnimate
401175374NonAlternatingAnimate
401176398NonAlternatingAnimate
401177428NonAlternatingAnimate
401234547NonAlternatingAnimate
401235522NonAlternatingAnimate
401236579NonAlternatingAnimate
401237401NonAlternatingAnimate
40124340AlternatingInanimate
40125437AlternatingInanimate
40126361AlternatingInanimate
40127369AlternatingInanimate
401134410AlternatingAnimate
401135392AlternatingAnimate
401136702AlternatingAnimate
4011371280Alternating

Animate

;

 

3 REPLIES 3
PeteCthulu
Fluorite | Level 6
Oops
I mean 1379, not 1378.7. Either way, no values would be replaced in this set.
ballardw
Super User

I think this may do what you are asking for.

proc sql;
   create table want as
   select a.lines, a.subject, a.item,min(a.responstime,b.limit) as responsetime
         , a.region, a.verbtype
   from have as a
        left join
        (select region,mean(responsetime)+2.5*std(responsetime) as limit
         from have
         group by region) as b
      on a.region=b.region
  ;
quit; 

I would be very careful about actually replacing values in tables until you are sure of the result to avoid destroying the existing data (incorrect code, losing variables or such).

 

 

You did not mention any lower limit check so none is attempted.

Rick_SAS
SAS Super FREQ

You can certainly carry out the scheme that you request, but be aware that "to Winsorize data" has a particular statistical meaning that is different from what you describe. For the statistical definition, see "How to Winsorize data in SAS."  For a discussion of the pros and cons of trimming the extreme values, see "Winsorization: The good, the bad, and the ugly."

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 329 views
  • 0 likes
  • 3 in conversation