Hello All,
I might ask an easy question but I want the following. I want to select the top 1 highest value per SECTION and per DAY. Please see the example below:
*This is the data I have right now (with thousands of rows)
ID -- Location -- Section -- Date -- Time -- Type -- Value
01 -- Germany -- BB --1/1/20 -- 10:10 -- VW -- 9,0
02 -- Germany -- AA --1/1/20 -- 11:20 -- AU -- 7,0
03 -- Germany -- AA --1/1/20 -- 13:50 -- AU -- 6,0
04 -- Germany -- BB --1/1/20 -- 13:50 -- AU -- 5,7
05 -- Germany -- BB --2/1/20 -- 15:10 -- MC -- 11,3
06 -- Germany -- BB --2/1/20 -- 19:10 -- VW -- 12,0
07 -- Germany -- AA --2/1/20 -- 15:20 -- OP -- 6,7
08 -- Germany -- AA --2/1/20 -- 15:20 -- AU -- 2,0
09 -- Germany -- BB --3/1/20 -- 16:16 -- MC -- 1,5
10 -- Germany -- BB --3/1/20 -- 11:45 -- MC -- 5,8
11 -- Germany -- AA --3/1/20 -- 09:40 -- OP -- 2,7
12 -- Germany -- AA --3/1/20 -- 02:40 -- VW -- 2,0
*This is the data I want:
ID -- Location -- Section -- Date -- Time -- Type -- Value
01 -- Germany -- BB --1/1/20 -- 10:10 -- VW -- 9,0
02 -- Germany -- AA --1/1/20 -- 11:20 -- AU -- 7,0
06 -- Germany -- BB --2/1/20 -- 19:10 -- VW -- 12,0
07 -- Germany -- AA --2/1/20 -- 15:20 -- OP -- 6,7
10 -- Germany -- BB --3/1/20 -- 11:45 -- MC -- 5,8
11 -- Germany -- AA --3/1/20 -- 09:40 -- OP -- 2,7
I appreciate your help all 🙂 Thank you in advance!
proc sort data=have;
by section date descending value;
run;
data want;
set have;
by section date;
if first.date;
run;
use proc means to get what you want:
proc means data=have;
class location section date;
var value;
output out=want(drop=_type_ count) max=;
run;
proc sort data=have;
by section date descending value;
run;
data want;
set have;
by section date;
if first.date;
run;
For a small dataset like yours, the data2datastep macro is not needed, it's easier to write the data step directly:
data have;
input
ID :$2.
Location :$10.
Section :$2.
Date :ddmmyy10.
Time :time5.
Type :$2.
Value :commax10.
;
format
date ddmmyyp10.
time time5.
value commax10.2
;
datalines;
01 Germany BB 1/1/20 10:10 VW 9,0
02 Germany AA 1/1/20 11:20 AU 7,0
;
As I said, not rocket science at all.
Here, it is done in a single pass and with the desired result.
data have;
input ID $ Location $ Section $ Date :ddmmyy10. Time time5. Type $ Value :comma8.1;
format Time hhmm5.;
format Date ddmmyy10.;
datalines;
01 Germany BB 1/1/20 10:10 VW 9,0
02 Germany AA 1/1/20 11:20 AU 7,0
03 Germany AA 1/1/20 13:50 AU 6,0
04 Germany BB 1/1/20 13:50 AU 5,7
05 Germany BB 2/1/20 15:10 MC 11,3
06 Germany BB 2/1/20 19:10 VW 12,0
07 Germany AA 2/1/20 15:20 OP 6,7
08 Germany AA 2/1/20 15:20 AU 2,0
09 Germany BB 3/1/20 16:16 MC 1,5
10 Germany BB 3/1/20 11:45 MC 5,8
11 Germany AA 3/1/20 09:40 OP 2,7
12 Germany AA 3/1/20 02:40 VW 2,0
;
data want (drop=rc);
if _N_ = 1 then do;
declare hash h (dataset : 'have(obs=0)', ordered : 'd', multidata : 'y');
h.definekey ('Section', 'Value');
h.definedata (all : 'y');
h.definedone();
declare hiter hi ('h');
end;
do until (last.Section);
set have;
by Section notsorted Date;
h.add();
end;
rc = hi.next();
output;
rc = hi.prev();
h.clear();
run;
Result:
ID Location Section Date Time Type Value 01 Germany BB 01/01/1920 10:10 VW 9.0 02 Germany AA 01/01/1920 11:20 AU 7.0 06 Germany BB 02/01/1920 19:10 VW 12.0 07 Germany AA 02/01/1920 15:20 OP 6.7 10 Germany BB 03/01/1920 11:45 MC 5.8 11 Germany AA 03/01/1920 9:40 OP 2.7
That does not deal with all cases correctly. Feed it this data:
data have;
input ID $ Location $ Section $ Date :ddmmyy10. Time time5. Type $ Value :comma8.1;
format Time hhmm5.;
format Date ddmmyy10.;
datalines;
01 Germany BB 1/1/20 10:10 VW 9,0
02 Germany AA 1/1/20 11:20 AU 7,0
03 Germany AA 1/1/20 13:50 AU 6,0
04 Germany BB 1/1/20 13:50 AU 10,7
;
and you'll get two entries for BB instead of only one.
@Kurt_Bremser isn't that what the OP want data set implies?
@PeterClemmensen wrote:
@Kurt_Bremser isn't that what the OP want data set implies?
That's only because the "have" dataset in the OP does not have one single case where a higher value follows later in the day, with another section appearing in between. That's why I subtly changed the input data to have this kind of "edge case".
The OP rule is "one value for a combination of section and date", and the hash code throws two values for 01/01 and BB (with my changed data). You would need to fill the hash per date, then iterate through the whole hash object to get all firsts per section, and then clear the hash to make the processing of the next day easier.
@Kurt_Bremser an other quick case, what if I not want to select the top 1 per SECTION and per DAY but the TOP 4 per SECTION and DAY and add like a RANKING column to show the ranking from 1-4. How will your code below look like then?
proc sort data=have;
by section date descending value;
run;
data want;
set have;
by section date;
if first.date;
run;
@AK100 wrote:
@Kurt_Bremser an other quick case, what if I not want to select the top 1 per SECTION and per DAY but the TOP 4 per SECTION and DAY and add like a RANKING column to show the ranking from 1-4. How will your code below look like then?
proc sort data=have; by section date descending value; run; data want; set have; by section date; if first.date; run;
That's easy. At first.date, set a (retained) counter to 1 (instead of the subsetting if you have now), and increment it else. Use a subsetting if to only output when the counter is le 4.
@Kurt_Bremser Okay, I just figured out this one (which also gives me the desired output). Is is the same as yours or does it look slightly different?
proc sort data=have;
by section date descending value;
run;
data top4;
set have;
by section date;
retain ranking;
if first.date then
ranking =0;
ranking +1;
if ranking <5;
run;
@
This looks good (and is valid), but I would do it slightly different:
data top4;
set have;
by section date;
if first.date
then ranking = 1;
else ranking + 1; /* when you use this type of increment statement, the variable is automatically retained */
if ranking le 4;
run;
This is purely a matter of "code cosmetics"; at first.date, there is one statement less to execute.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.