BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AK100
Pyrite | Level 9

Hello All, 

 

I might ask an easy question but I want the following. I want to select the top 1 highest value per SECTION and per DAY. Please see the example below:

 

*This is the data I have right now (with thousands of rows)

ID -- Location -- Section -- Date -- Time -- Type -- Value

01 -- Germany -- BB         --1/1/20 -- 10:10 -- VW --   9,0

02 -- Germany -- AA         --1/1/20 -- 11:20 -- AU --    7,0

03 -- Germany -- AA         --1/1/20 -- 13:50 -- AU --    6,0

04 -- Germany -- BB         --1/1/20 -- 13:50 -- AU --    5,7

05 -- Germany -- BB         --2/1/20 -- 15:10 -- MC --  11,3

06 -- Germany -- BB         --2/1/20 -- 19:10 -- VW --  12,0

07 -- Germany -- AA         --2/1/20 -- 15:20 -- OP --    6,7

08 -- Germany -- AA         --2/1/20 -- 15:20 -- AU --    2,0

09 -- Germany -- BB         --3/1/20 -- 16:16 -- MC --   1,5

10 -- Germany -- BB         --3/1/20 -- 11:45 -- MC --   5,8

11 -- Germany -- AA         --3/1/20 -- 09:40 -- OP --   2,7

12 -- Germany -- AA         --3/1/20 -- 02:40 -- VW --   2,0

 

*This is the data I want:

ID -- Location -- Section -- Date -- Time -- Type -- Value

01 -- Germany -- BB         --1/1/20 -- 10:10 -- VW --   9,0

02 -- Germany -- AA         --1/1/20 -- 11:20 -- AU --    7,0

06 -- Germany -- BB         --2/1/20 -- 19:10 -- VW --  12,0

07 -- Germany -- AA         --2/1/20 -- 15:20 -- OP --     6,7

10 -- Germany -- BB         --3/1/20 -- 11:45 -- MC --     5,8

11 -- Germany -- AA         --3/1/20 -- 09:40 -- OP --     2,7

 

I appreciate your help all 🙂 Thank you in advance!

 

1 ACCEPTED SOLUTION
18 REPLIES 18
Shmuel
Garnet | Level 18

use proc means to get what you want:

 

proc means data=have;
   class location section date;
   var value;
   output out=want(drop=_type_ count) max=;
run;
AK100
Pyrite | Level 9
It almost worked, but thank you in advance!
AK100
Pyrite | Level 9
Thank you @Kurt, your good. You made it look really simple!

By the way. I'am still working on that conversion to data steps thing. I know it makes it easier for you guys to check the data and solve the problems. Hope I can use it soon.
Kurt_Bremser
Super User

For a small dataset like yours, the data2datastep macro is not needed, it's easier to write the data step directly:

data have;
input
  ID :$2.
  Location :$10.
  Section :$2.
  Date :ddmmyy10.
  Time :time5.
  Type :$2.
  Value :commax10.
;
format
  date ddmmyyp10.
  time time5.
  value commax10.2
;
datalines;
01 Germany BB 1/1/20 10:10 VW 9,0
02 Germany AA 1/1/20 11:20 AU 7,0
;

As I said, not rocket science at all.

PeterClemmensen
Tourmaline | Level 20

Here, it is done in a single pass and with the desired result.

 

data have;
input ID $ Location $ Section $ Date :ddmmyy10. Time time5. Type $ Value :comma8.1;
format Time hhmm5.;
format Date ddmmyy10.;
datalines;
01 Germany BB 1/1/20 10:10 VW 9,0
02 Germany AA 1/1/20 11:20 AU 7,0
03 Germany AA 1/1/20 13:50 AU 6,0
04 Germany BB 1/1/20 13:50 AU 5,7
05 Germany BB 2/1/20 15:10 MC 11,3
06 Germany BB 2/1/20 19:10 VW 12,0
07 Germany AA 2/1/20 15:20 OP 6,7
08 Germany AA 2/1/20 15:20 AU 2,0
09 Germany BB 3/1/20 16:16 MC 1,5
10 Germany BB 3/1/20 11:45 MC 5,8
11 Germany AA 3/1/20 09:40 OP 2,7
12 Germany AA 3/1/20 02:40 VW 2,0
;

data want (drop=rc);
   if _N_ = 1 then do;
      declare hash h (dataset : 'have(obs=0)', ordered : 'd', multidata : 'y');
      h.definekey ('Section', 'Value');
      h.definedata (all : 'y');
      h.definedone();
      declare hiter hi ('h');
   end;

   do until (last.Section);
      set have;
      by Section notsorted Date;
      h.add();
   end;

   rc = hi.next();  
   output; 
   rc = hi.prev();
   h.clear();    
run;

Result:

 

ID Location Section Date       Time  Type Value 
01 Germany  BB      01/01/1920 10:10 VW   9.0 
02 Germany  AA      01/01/1920 11:20 AU   7.0 
06 Germany  BB      02/01/1920 19:10 VW   12.0 
07 Germany  AA      02/01/1920 15:20 OP   6.7 
10 Germany  BB      03/01/1920 11:45 MC   5.8 
11 Germany  AA      03/01/1920 9:40  OP   2.7 
Kurt_Bremser
Super User

Hi @PeterClemmensen 

That does not deal with all cases correctly. Feed it this data:

data have;
input ID $ Location $ Section $ Date :ddmmyy10. Time time5. Type $ Value :comma8.1;
format Time hhmm5.;
format Date ddmmyy10.;
datalines;
01 Germany BB 1/1/20 10:10 VW 9,0
02 Germany AA 1/1/20 11:20 AU 7,0
03 Germany AA 1/1/20 13:50 AU 6,0
04 Germany BB 1/1/20 13:50 AU 10,7
;

and you'll get two entries for BB instead of only one.

PeterClemmensen
Tourmaline | Level 20

@Kurt_Bremser isn't that what the OP want data set implies?

Kurt_Bremser
Super User

@PeterClemmensen wrote:

@Kurt_Bremser isn't that what the OP want data set implies?


That's only because the "have" dataset in the OP does not have one single case where a higher value follows later in the day, with another section appearing in between. That's why I subtly changed the input data to have this kind of "edge case".

 

The OP rule is "one value for a combination of section and date", and the hash code throws two values for 01/01 and BB (with my changed data). You would need to fill the hash per date, then iterate through the whole hash object to get all firsts per section, and then clear the hash to make the processing of the next day easier.

AK100
Pyrite | Level 9

@Kurt_Bremser an other quick case, what if I not want to select the top 1 per SECTION and per DAY but the TOP 4 per SECTION and DAY and add like a RANKING column to show the ranking from 1-4. How will your code below look like then? 

 

proc sort data=have;
by section date descending value;
run;

data want;
set have;
by section date;
if first.date;
run;

 

Kurt_Bremser
Super User

@AK100 wrote:

@Kurt_Bremser an other quick case, what if I not want to select the top 1 per SECTION and per DAY but the TOP 4 per SECTION and DAY and add like a RANKING column to show the ranking from 1-4. How will your code below look like then? 

 

proc sort data=have;
by section date descending value;
run;

data want;
set have;
by section date;
if first.date;
run;

 


That's easy. At first.date, set a (retained) counter to 1 (instead of the subsetting if you have now), and increment it else. Use a subsetting if to only output when the counter is le 4.

AK100
Pyrite | Level 9

@Kurt_Bremser Okay, I just figured out this one (which also gives me the desired output). Is is the same as yours or does it look slightly different? 

proc sort data=have;
	by section date descending value;
run;

data top4;
	set have;
	by section date;
	retain ranking;

	if first.date then
		ranking =0;
	ranking +1;

	if ranking <5;
run;

Kurt_Bremser
Super User

This looks good (and is valid), but I would do it slightly different:

data top4;
set have;
by section date;
if first.date
then ranking = 1;
else ranking + 1; /* when you use this type of increment statement, the variable is automatically retained */
if ranking le 4;
run;

This is purely a matter of "code cosmetics"; at first.date, there is one statement less to execute.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1495 views
  • 7 likes
  • 4 in conversation