- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello expert
I want to trim my data by eliminate highest and lowest (G1-G13) for each observation from my data set .
data grades;
infile '/folders/myshortcuts/SASUniversityEdition/module 7/grades.txt';
input id 4. @6 G1 1. G2 1. G3 1.
G4 1. G5 1. G6 1. G7 1. G8 1.
G9 1. G10 1. G11 1. G12 1. G13 1.
;
total = G1+G2+G3+G4+G5+G6+G7
+G8+G9+G10+G11+G12+G13;
proc
data = grades;
here is my output
Obs id G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 total 1 2 3
1105 | 8 | 7 | 5 | 8 | 3 | 7 | 7 | 8 | 8 | 8 | 8 | 8 | 6 | 91 |
1294 | 8 | 8 | 8 | 7 | 5 | 7 | 8 | 8 | 8 | 8 | 5 | 8 | 2 | 90 |
2009 | 8 | 6 | 3 | 8 | 4 | 6 | 7 | 7 | 8 | 7 | 0 | 4 | 7 | 75 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's not clear what "trim" your data means. You can't get rid of variables, because G1 might contain the largest value on one observation, but a middling value on another observation.
GIven what you wrote so far, you could conceivably use:
trimmed_total = total - ( max(of g1-g13) + min(of g1-g13) ) ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What do you want as output?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you have an unfortunate nonnormalized data structure that makes the task harder.
If I take
data grades;
input id grade1 grade2 grade3 grade4;
datalines;
1105 1 2 3 4
1106 2 2 2 2
1107 3 3 4 6
;
run;
and flip it to a normalized structure like
proc sql;
create table grades_normalized as
select id as id, 'assignment1' as assignment, grade1 as grade
from grades
union
select id, 'assignment2', grade2
from grades
union
select id, 'assignment3', grade3
from grades
union
select id, 'assignment4', grade4
from grades;
quit;
I get a table that looks like
Obs id assignment grade
1 1105 assignment1 1
2 1105 assignment2 2
3 1105 assignment3 3
4 1105 assignment4 4
5 1106 assignment1 2
6 1106 assignment2 2
7 1106 assignment3 2
8 1106 assignment4 2
9 1107 assignment1 3
10 1107 assignment2 3
11 1107 assignment3 4
12 1107 assignment4 6
With that structure, I can sort (not needed for this data, but you never know) and trim.
proc sort data= grades_normalized;
by id grade;
run;
data trimmed_grades;
set grades_normalized;
by id;
if first.id then delete;
if last.id then delete;
run;
then summarize
proc sql;
create table grade_report as
select id as id, sum(grade) as grade_total
from trimmed_grades
group by id
order by id;
quit;
and end up with
grade_
Obs id total
1 1105 5
2 1106 4
3 1107 7
Now one of the experts around here will post a solution that does this all in the data step or with a macro or something but they haven't replied so you are stuck with me.
EDIT:
Did I not say somebody better than me would come along and do it in the data step?
data grades;
input id 4. g1-g13;
datalines;
1105 8 7 5 8 3 7 7 8 8 8 8 8 6
1294 8 8 8 7 5 7 8 8 8 8 5 8 2
2009 8 6 3 8 4 6 7 7 8 7 0 4 7
;
run;
data want;
set grades;
total = sum( of g1-g13);
trim_total = total - max(of g1-g13) - min(of g1-g13);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Simpler to read the data differently perhaps to get the normalized structure:
data grades; input id 4. @ ; do assignment=1 to 13; input grade @; output; end; input; datalines; 1105 8 7 5 8 3 7 7 8 8 8 8 8 6 91 1294 8 8 8 7 5 7 8 8 8 8 5 8 2 90 2009 8 6 3 8 4 6 7 7 8 7 0 4 7 75 ; run;
which ignores the "total" in the example data provided by OP.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It really isn't obvious what your desired output is.
If the resulting order of the G variables is not critical here's a possible start;
data grades; input id 4. g1-g13; datalines; 1105 8 7 5 8 3 7 7 8 8 8 8 8 6 91 1294 8 8 8 7 5 7 8 8 8 8 5 8 2 90 2009 8 6 3 8 4 6 7 7 8 7 0 4 7 75 ; run; data want; set grades; array g g1-g13; call sortn(of g(*)); total = sum( of g2-g12); run;
Note that I have posted a way to duplicate your data (at least your example output) since we do not have your file. Note that your data step as written may yield surprising results if your data contains values like 10 or 100.
The second data step, which could actually be done in the step that reads the data, sorts the values of G1 - G13.
If you want a trimmed total that is what I show. If you want to actually remove the the largest and smallest values the you could drop the variables G1 and G13.
However this approach may only work if all of G1 to G13 exist.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And how do you want to handle ties? If there are 4 8's are you removing all?
I would use LARGEST/SMALLEST to remove the values not needed but without what you want as output and what you have as input we can't really connect all the dots yet.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's not clear what "trim" your data means. You can't get rid of variables, because G1 might contain the largest value on one observation, but a middling value on another observation.
GIven what you wrote so far, you could conceivably use:
trimmed_total = total - ( max(of g1-g13) + min(of g1-g13) ) ;