DATA Step, Macro, Functions and more

trimmed mean

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

trimmed mean

Hello expert

I want to trim my data by eliminate highest and lowest (G1-G13) for each observation from my data set . 

data grades;
infile '/folders/myshortcuts/SASUniversityEdition/module 7/grades.txt';
input id 4. @6 G1 1. G2 1. G3 1.
G4 1. G5 1. G6 1. G7 1. G8 1.
G9 1. G10 1. G11 1. G12 1. G13 1.
;
total = G1+G2+G3+G4+G5+G6+G7
+G8+G9+G10+G11+G12+G13;
proc
print
data = grades;

 here is my output

Obs id G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 total 1 2 3

1105875837788888691
1294888757888858290
2009863846778704775
 

Accepted Solutions
Solution
‎07-27-2017 11:45 PM
Super User
Posts: 5,504

Re: trimmed mean

It's not clear what "trim" your data means.  You can't get rid of variables, because G1 might contain the largest value on one observation, but a middling value on another observation.

 

GIven what you wrote so far, you could conceivably use:

 

trimmed_total = total - ( max(of g1-g13) + min(of g1-g13) ) ;

View solution in original post


All Replies
Super User
Posts: 19,792

Re: trimmed mean

What do you want as output?

Regular Contributor
Regular Contributor
Posts: 156

Re: trimmed mean

[ Edited ]

I think you have an unfortunate nonnormalized data structure that makes the task harder.

 

If I take

data grades;
   input id grade1 grade2 grade3 grade4;
datalines;
1105 1 2 3 4
1106 2 2 2 2
1107 3 3 4 6
;
run;

and flip it to a normalized structure like

proc sql;
    create table grades_normalized as 
    select id as id, 'assignment1' as assignment, grade1 as grade
    from grades
    union
    select id, 'assignment2', grade2
    from grades
    union
    select id, 'assignment3', grade3
    from grades
    union
    select id, 'assignment4', grade4
    from grades;
quit;

 

 

I get a table that looks like

                              Obs     id     assignment     grade

                                1    1105    assignment1      1
                                2    1105    assignment2      2
                                3    1105    assignment3      3
                                4    1105    assignment4      4
                                5    1106    assignment1      2
                                6    1106    assignment2      2
                                7    1106    assignment3      2
                                8    1106    assignment4      2
                                9    1107    assignment1      3
                               10    1107    assignment2      3
                               11    1107    assignment3      4
                               12    1107    assignment4      6

With that structure, I can sort (not needed for this data, but you never know) and trim.

proc sort data= grades_normalized;
 	by id grade;
run;

data trimmed_grades;
	set grades_normalized;
	by id;
	if first.id then delete;
	if last.id then delete;
run;

then summarize

proc sql;
	create table grade_report as 
	select id as id, sum(grade) as grade_total
	from trimmed_grades
	group by id
	order by id;
quit;

and end up with

 

                                                     grade_
                                     Obs     id      total

                                      1     1105       5
                                      2     1106       4
                                      3     1107       7

Now one of the experts around here will post a solution that does this all in the data step or with a macro or something but they haven't replied so you are stuck with me.

 

 

 

 

EDIT:

 

Did I not say somebody better than me would come along and do it in the data step?

data grades;
input id 4. g1-g13;
datalines;
1105 8 7 5 8 3 7 7 8 8 8 8 8 6 
1294 8 8 8 7 5 7 8 8 8 8 5 8 2 
2009 8 6 3 8 4 6 7 7 8 7 0 4 7 
;
run;

data want;
   set grades;
   total = sum( of g1-g13);
   trim_total = total - max(of g1-g13) - min(of g1-g13);
run;
Super User
Posts: 11,343

Re: trimmed mean

Simpler to read the data differently perhaps to get the normalized structure:

 

data grades;
   input id 4. @ ;
   do assignment=1 to 13;
      input grade @;
      output;
   end;
   input;
datalines;
1105 8 7 5 8 3 7 7 8 8 8 8 8 6 91 
1294 8 8 8 7 5 7 8 8 8 8 5 8 2 90 
2009 8 6 3 8 4 6 7 7 8 7 0 4 7 75 
;
run;

which ignores the "total" in the example data provided by OP.

 

Super User
Posts: 11,343

Re: trimmed mean

It really isn't obvious what your desired output is.

If the resulting order of the G variables is not critical here's a possible start;

data grades;
input id 4. g1-g13;
datalines;
1105 8 7 5 8 3 7 7 8 8 8 8 8 6 91 
1294 8 8 8 7 5 7 8 8 8 8 5 8 2 90 
2009 8 6 3 8 4 6 7 7 8 7 0 4 7 75 
;
run;

data want;
   set grades;
   array g g1-g13;
   call sortn(of g(*));
   total = sum( of g2-g12);
run;

Note that I have posted a way to duplicate your data (at least your example output) since we do not have your file. Note that your data step as written may yield surprising results if your data contains values like 10 or 100.

 

 

The second data step, which could actually be done in the step that reads the data, sorts the values of G1 - G13.

If you want a trimmed total that is what I show. If you want to actually remove the the largest and smallest values the you could drop the variables G1 and G13.

However this approach may only work if all of G1 to G13 exist.

Super User
Posts: 19,792

Re: trimmed mean

And how do you want to handle ties? If there are 4 8's are you removing all?


I would use LARGEST/SMALLEST to remove the values not needed but without what you want as output and what you have as input we can't really connect all the dots yet.

Solution
‎07-27-2017 11:45 PM
Super User
Posts: 5,504

Re: trimmed mean

It's not clear what "trim" your data means.  You can't get rid of variables, because G1 might contain the largest value on one observation, but a middling value on another observation.

 

GIven what you wrote so far, you could conceivably use:

 

trimmed_total = total - ( max(of g1-g13) + min(of g1-g13) ) ;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 292 views
  • 3 likes
  • 5 in conversation