Solved: Re: Select top 1 (highest value )per section and day in SAS EG

AK100 · Posted 04-01-2020 02:33 AM

Hello All,

I might ask an easy question but I want the following. I want to select the top 1 highest value per SECTION and per DAY. Please see the example below:

*This is the data I have right now (with thousands of rows)

ID -- Location -- Section -- Date -- Time -- Type -- Value

01 -- Germany -- BB --1/1/20 -- 10:10 -- VW -- 9,0

02 -- Germany -- AA --1/1/20 -- 11:20 -- AU -- 7,0

03 -- Germany -- AA --1/1/20 -- 13:50 -- AU -- 6,0

04 -- Germany -- BB --1/1/20 -- 13:50 -- AU -- 5,7

05 -- Germany -- BB --2/1/20 -- 15:10 -- MC -- 11,3

06 -- Germany -- BB --2/1/20 -- 19:10 -- VW -- 12,0

07 -- Germany -- AA --2/1/20 -- 15:20 -- OP -- 6,7

08 -- Germany -- AA --2/1/20 -- 15:20 -- AU -- 2,0

09 -- Germany -- BB --3/1/20 -- 16:16 -- MC -- 1,5

10 -- Germany -- BB --3/1/20 -- 11:45 -- MC -- 5,8

11 -- Germany -- AA --3/1/20 -- 09:40 -- OP -- 2,7

12 -- Germany -- AA --3/1/20 -- 02:40 -- VW -- 2,0

*This is the data I want:

ID -- Location -- Section -- Date -- Time -- Type -- Value

01 -- Germany -- BB --1/1/20 -- 10:10 -- VW -- 9,0

02 -- Germany -- AA --1/1/20 -- 11:20 -- AU -- 7,0

06 -- Germany -- BB --2/1/20 -- 19:10 -- VW -- 12,0

07 -- Germany -- AA --2/1/20 -- 15:20 -- OP -- 6,7

10 -- Germany -- BB --3/1/20 -- 11:45 -- MC -- 5,8

11 -- Germany -- AA --3/1/20 -- 09:40 -- OP -- 2,7

I appreciate your help all 🙂 Thank you in advance!

Kurt_Bremser · Posted 04-01-2020 02:52 AM

proc sort data=have;
by section date descending value;
run;

data want;
set have;
by section date;
if first.date;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Shmuel · Posted 04-01-2020 02:48 AM

use proc means to get what you want:

proc means data=have;
   class location section date;
   var value;
   output out=want(drop=_type_ count) max=;
run;

AK100 · Posted 04-01-2020 03:03 AM

It almost worked, but thank you in advance!

Kurt_Bremser · Posted 04-01-2020 02:52 AM

proc sort data=have;
by section date descending value;
run;

data want;
set have;
by section date;
if first.date;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AK100 · Posted 04-01-2020 03:02 AM

Thank you @Kurt, your good. You made it look really simple!

By the way. I'am still working on that conversion to data steps thing. I know it makes it easier for you guys to check the data and solve the problems. Hope I can use it soon.

Kurt_Bremser · Posted 04-01-2020 03:14 AM

For a small dataset like yours, the data2datastep macro is not needed, it's easier to write the data step directly:

data have;
input
  ID :$2.
  Location :$10.
  Section :$2.
  Date :ddmmyy10.
  Time :time5.
  Type :$2.
  Value :commax10.
;
format
  date ddmmyyp10.
  time time5.
  value commax10.2
;
datalines;
01 Germany BB 1/1/20 10:10 VW 9,0
02 Germany AA 1/1/20 11:20 AU 7,0
;

As I said, not rocket science at all.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PeterClemmensen · Posted 04-01-2020 03:20 AM

Here, it is done in a single pass and with the desired result.

data have;
input ID $ Location $ Section $ Date :ddmmyy10. Time time5. Type $ Value :comma8.1;
format Time hhmm5.;
format Date ddmmyy10.;
datalines;
01 Germany BB 1/1/20 10:10 VW 9,0
02 Germany AA 1/1/20 11:20 AU 7,0
03 Germany AA 1/1/20 13:50 AU 6,0
04 Germany BB 1/1/20 13:50 AU 5,7
05 Germany BB 2/1/20 15:10 MC 11,3
06 Germany BB 2/1/20 19:10 VW 12,0
07 Germany AA 2/1/20 15:20 OP 6,7
08 Germany AA 2/1/20 15:20 AU 2,0
09 Germany BB 3/1/20 16:16 MC 1,5
10 Germany BB 3/1/20 11:45 MC 5,8
11 Germany AA 3/1/20 09:40 OP 2,7
12 Germany AA 3/1/20 02:40 VW 2,0
;

data want (drop=rc);
   if _N_ = 1 then do;
      declare hash h (dataset : 'have(obs=0)', ordered : 'd', multidata : 'y');
      h.definekey ('Section', 'Value');
      h.definedata (all : 'y');
      h.definedone();
      declare hiter hi ('h');
   end;

   do until (last.Section);
      set have;
      by Section notsorted Date;
      h.add();
   end;

   rc = hi.next();  
   output; 
   rc = hi.prev();
   h.clear();    
run;

Result:

ID Location Section Date       Time  Type Value 
01 Germany  BB      01/01/1920 10:10 VW   9.0 
02 Germany  AA      01/01/1920 11:20 AU   7.0 
06 Germany  BB      02/01/1920 19:10 VW   12.0 
07 Germany  AA      02/01/1920 15:20 OP   6.7 
10 Germany  BB      03/01/1920 11:45 MC   5.8 
11 Germany  AA      03/01/1920 9:40  OP   2.7

The DATA to DATA Step Macro
Blog: SASnrd

Kurt_Bremser · Posted 04-01-2020 03:31 AM

Hi @PeterClemmensen

That does not deal with all cases correctly. Feed it this data:

data have;
input ID $ Location $ Section $ Date :ddmmyy10. Time time5. Type $ Value :comma8.1;
format Time hhmm5.;
format Date ddmmyy10.;
datalines;
01 Germany BB 1/1/20 10:10 VW 9,0
02 Germany AA 1/1/20 11:20 AU 7,0
03 Germany AA 1/1/20 13:50 AU 6,0
04 Germany BB 1/1/20 13:50 AU 10,7
;

and you'll get two entries for BB instead of only one.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PeterClemmensen · Posted 04-01-2020 03:40 AM

@Kurt_Bremser isn't that what the OP want data set implies?

The DATA to DATA Step Macro
Blog: SASnrd

Kurt_Bremser · Posted 04-01-2020 03:55 AM

@PeterClemmensen wrote:

@Kurt_Bremser isn't that what the OP want data set implies?

That's only because the "have" dataset in the OP does not have one single case where a higher value follows later in the day, with another section appearing in between. That's why I subtly changed the input data to have this kind of "edge case".

The OP rule is "one value for a combination of section and date", and the hash code throws two values for 01/01 and BB (with my changed data). You would need to fill the hash per date, then iterate through the whole hash object to get all firsts per section, and then clear the hash to make the processing of the next day easier.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AK100 · Posted 04-01-2020 03:45 AM

@Kurt_Bremser an other quick case, what if I not want to select the top 1 per SECTION and per DAY but the TOP 4 per SECTION and DAY and add like a RANKING column to show the ranking from 1-4. How will your code below look like then?

proc sort data=have;
by section date descending value;
run;

data want;
set have;
by section date;
if first.date;
run;

Kurt_Bremser · Posted 04-01-2020 03:59 AM

@AK100 wrote:

@Kurt_Bremser an other quick case, what if I not want to select the top 1 per SECTION and per DAY but the TOP 4 per SECTION and DAY and add like a RANKING column to show the ranking from 1-4. How will your code below look like then?
proc sort data=have;
by section date descending value;
run;

data want;
set have;
by section date;
if first.date;
run;

That's easy. At first.date, set a (retained) counter to 1 (instead of the subsetting if you have now), and increment it else. Use a subsetting if to only output when the counter is le 4.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AK100 · Posted 04-01-2020 04:11 AM

@Kurt_Bremser Okay, I just figured out this one (which also gives me the desired output). Is is the same as yours or does it look slightly different?

proc sort data=have;
	by section date descending value;
run;

data top4;
	set have;
	by section date;
	retain ranking;

	if first.date then
		ranking =0;
	ranking +1;

	if ranking <5;
run;

@

Kurt_Bremser · Posted 04-01-2020 04:17 AM

This looks good (and is valid), but I would do it slightly different:

data top4;
set have;
by section date;
if first.date
then ranking = 1;
else ranking + 1; /* when you use this type of increment statement, the variable is automatically retained */
if ranking le 4;
run;

This is purely a matter of "code cosmetics"; at first.date, there is one statement less to execute.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

AK100 · Posted 04-01-2020 04:19 AM

Thank You 🙂

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away