BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Koke
Fluorite | Level 6

Hello expert

I want to trim my data by eliminate highest and lowest (G1-G13) for each observation from my data set . 

data grades;
infile '/folders/myshortcuts/SASUniversityEdition/module 7/grades.txt';
input id 4. @6 G1 1. G2 1. G3 1.
G4 1. G5 1. G6 1. G7 1. G8 1.
G9 1. G10 1. G11 1. G12 1. G13 1.
;
total = G1+G2+G3+G4+G5+G6+G7
+G8+G9+G10+G11+G12+G13;
proc
print
data = grades;

 here is my output

Obs id G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 total 1 2 3

1105875837788888691
1294888757888858290
2009863846778704775
 
1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

It's not clear what "trim" your data means.  You can't get rid of variables, because G1 might contain the largest value on one observation, but a middling value on another observation.

 

GIven what you wrote so far, you could conceivably use:

 

trimmed_total = total - ( max(of g1-g13) + min(of g1-g13) ) ;

View solution in original post

6 REPLIES 6
Reeza
Super User

What do you want as output?

HB
Barite | Level 11 HB
Barite | Level 11

I think you have an unfortunate nonnormalized data structure that makes the task harder.

 

If I take

data grades;
   input id grade1 grade2 grade3 grade4;
datalines;
1105 1 2 3 4
1106 2 2 2 2
1107 3 3 4 6
;
run;

and flip it to a normalized structure like

proc sql;
    create table grades_normalized as 
    select id as id, 'assignment1' as assignment, grade1 as grade
    from grades
    union
    select id, 'assignment2', grade2
    from grades
    union
    select id, 'assignment3', grade3
    from grades
    union
    select id, 'assignment4', grade4
    from grades;
quit;

 

 

I get a table that looks like

                              Obs     id     assignment     grade

                                1    1105    assignment1      1
                                2    1105    assignment2      2
                                3    1105    assignment3      3
                                4    1105    assignment4      4
                                5    1106    assignment1      2
                                6    1106    assignment2      2
                                7    1106    assignment3      2
                                8    1106    assignment4      2
                                9    1107    assignment1      3
                               10    1107    assignment2      3
                               11    1107    assignment3      4
                               12    1107    assignment4      6

With that structure, I can sort (not needed for this data, but you never know) and trim.

proc sort data= grades_normalized;
 	by id grade;
run;

data trimmed_grades;
	set grades_normalized;
	by id;
	if first.id then delete;
	if last.id then delete;
run;

then summarize

proc sql;
	create table grade_report as 
	select id as id, sum(grade) as grade_total
	from trimmed_grades
	group by id
	order by id;
quit;

and end up with

 

                                                     grade_
                                     Obs     id      total

                                      1     1105       5
                                      2     1106       4
                                      3     1107       7

Now one of the experts around here will post a solution that does this all in the data step or with a macro or something but they haven't replied so you are stuck with me.

 

 

 

 

EDIT:

 

Did I not say somebody better than me would come along and do it in the data step?

data grades;
input id 4. g1-g13;
datalines;
1105 8 7 5 8 3 7 7 8 8 8 8 8 6 
1294 8 8 8 7 5 7 8 8 8 8 5 8 2 
2009 8 6 3 8 4 6 7 7 8 7 0 4 7 
;
run;

data want;
   set grades;
   total = sum( of g1-g13);
   trim_total = total - max(of g1-g13) - min(of g1-g13);
run;
ballardw
Super User

Simpler to read the data differently perhaps to get the normalized structure:

 

data grades;
   input id 4. @ ;
   do assignment=1 to 13;
      input grade @;
      output;
   end;
   input;
datalines;
1105 8 7 5 8 3 7 7 8 8 8 8 8 6 91 
1294 8 8 8 7 5 7 8 8 8 8 5 8 2 90 
2009 8 6 3 8 4 6 7 7 8 7 0 4 7 75 
;
run;

which ignores the "total" in the example data provided by OP.

 

ballardw
Super User

It really isn't obvious what your desired output is.

If the resulting order of the G variables is not critical here's a possible start;

data grades;
input id 4. g1-g13;
datalines;
1105 8 7 5 8 3 7 7 8 8 8 8 8 6 91 
1294 8 8 8 7 5 7 8 8 8 8 5 8 2 90 
2009 8 6 3 8 4 6 7 7 8 7 0 4 7 75 
;
run;

data want;
   set grades;
   array g g1-g13;
   call sortn(of g(*));
   total = sum( of g2-g12);
run;

Note that I have posted a way to duplicate your data (at least your example output) since we do not have your file. Note that your data step as written may yield surprising results if your data contains values like 10 or 100.

 

 

The second data step, which could actually be done in the step that reads the data, sorts the values of G1 - G13.

If you want a trimmed total that is what I show. If you want to actually remove the the largest and smallest values the you could drop the variables G1 and G13.

However this approach may only work if all of G1 to G13 exist.

Reeza
Super User

And how do you want to handle ties? If there are 4 8's are you removing all?


I would use LARGEST/SMALLEST to remove the values not needed but without what you want as output and what you have as input we can't really connect all the dots yet.

Astounding
PROC Star

It's not clear what "trim" your data means.  You can't get rid of variables, because G1 might contain the largest value on one observation, but a middling value on another observation.

 

GIven what you wrote so far, you could conceivably use:

 

trimmed_total = total - ( max(of g1-g13) + min(of g1-g13) ) ;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 3361 views
  • 3 likes
  • 5 in conversation