BookmarkSubscribeRSS Feed
TomiKong
Fluorite | Level 6

My dataset is shown as below,

case_numbercase_value
a1
a2
a3
a4
b2
b3
b4
b5
c5
c6
c7
c9

I want to extract the observations whose case _value is maximal across each case_number. For this dataset, I want to get one subset like this.

case_numbercase_value
a4
b5
c

9

How to achieve this by coding?

Thanks!

11 REPLIES 11
Alpay
Fluorite | Level 6

data have;

input case_number  $ case_value;

datalines;

a 1

a 2

a 3

a 4

b 2

b 3

b 4

b 5

c 5

c 6

c 7

c 9

;

run;

proc sql;

  create table want as

  select distinct case_number,

  max(case_value) as case_value

  from have

  group by case_number;

quit;

PGStats
Opal | Level 21

Why use DISTINCT in this query?

PG

PG
Alpay
Fluorite | Level 6

Thanks for the question since it made me think what's happening with proc sql calls.

It will return the same result set without DISTINCT in select statement with one less operation.

I looked at the method chosen by proc sql with and without DISTINCT keyword in select statement.

When DISTINCT is used "sqxunqs" operation is run as an extra operation to de-dup records which are already unique before this operation.

I should be more careful next time when using DISTINCT keyword especially with large data sets.

Method without DISTINCT:

proc sql _method;

create table want as

  select case_number,

  max(case_value) as case_value

  from have

  group by case_number;

quit;

NOTE: SQL execution methods chosen are:

      sqxcrta

          sqxsumg

              sqxsort

                  sqxsrc( WORK.HAVE )

Method with DISTINCT:

proc sql _method;

create table want as

  select distinct case_number,

  max(case_value) as case_value

  from have

  group by case_number;

quit;

NOTE: SQL execution methods chosen are:

      sqxcrta

          sqxunqs

              sqxsumg

                  sqxsort

                      sqxsrc( WORK.HAVE )

Keith
Obsidian | Level 7

Another option is Proc Summary.  The 2nd example is based on the maximum value being returned when an ID statement is used.

proc summary data=have nway;

class case_number;

var case_value;

output out=want (drop=_:) max=;

run;


or

proc summary data=have nway;

class case_number;

id case_value;

output out=want (drop=_:);

run;

Linlin
Lapis Lazuli | Level 10

or no SQL solution:

data have;

input case_number  $ case_value;

datalines;

a 1

a 2

a 3

a 4

b 2

b 3

b 4

b 5

c 5

c 6

c 7

c 9

;

proc sort;

by case_number  $ case_value;

data want;

   set have;

   by case_number;

   if last.case_number;

proc print;run;

Linlin

MikeZdeb
Rhodochrosite | Level 12

hi ... the SQL and SUMMARY solutions seem most "economical", but if you are going to sort, you can always sort again and take advantage of the default EQUALS option that maintains order withing sort groups ...

proc sort data=have out=want;

by case_number descending case_value;

run;


proc sort data=want nodupkey;

by case_number;

run;

Linlin
Lapis Lazuli | Level 10

Hi Mike,

Thank you! I need to practice my Mapping skill learned from you. I am going to post a question at SAS/Graph and ODS Graphics. I am sure you will help me.

- Linlin

Haikuo
Onyx | Level 15

FWIW, a hash() solution:

data have;

input case_number$    case_value;

cards;

a    1

a    2

a    3

a    4

b    2

b    3

b    4

b    5

c    5

c    6

c    7

c    9

;

data _null_;

  if 0 then set have;

   dcl hash h(dataset:'have', ordered: 'a',duplicate:'r');

   h.definekey('case_number');

   h.definedata(all:'y');

   h.definedone();

    rc=h.output(dataset:'want');

   stop;

run;

proc print;run;

Haikuo

Mirisage
Obsidian | Level 7

Hi Haikuo,

I am totally new to hash. Hash looks like a vey handy appraoch.

Coud you please elaborate a bit how you have built the logic in your hash code (time permitting).

This simple data set looks a good opportunity to understand it.

Thanks

Mirisage

Haikuo
Onyx | Level 15

Hi Mirisage,

Hash() object is actually more straightforward than many other data step statements, such as 'set', 'merge', 'update', it is just a little different. And you are correct, Above example is ideal to take a peek into hash():

data _null_;

  if 0 then set have; /*This is to set hash() environment, eg variables, before you can define a hash(), in this case, you can also use: call missing (case_number, case_value);*/

   dcl hash h(dataset:'have', ordered: 'a',duplicate:'r'); /*This is to define hash(), named 'h', loaded from 'have', with an order 'a' as 'ascending', if there are duplicates by 'key', then only keep the last record ('r' denotes replace)*/

   h.definekey('case_number'); /*Define search key 'case_number'*/

   h.definedata(all:'y'); /*Define data element, all :'y' denoting with every thing from 'have'

   h.definedone(); /*the end of hash() definition*/

    rc=h.output(dataset:'want'); /output hash 'h' to dataset 'want'*/

   stop;

run;

Here is the help link, where I 've learned most of my hash():

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002576871.htm

Also there are tons of sugi paper on hash(), pay extra attention on those written by Paul dorfman.

Regards,

Haikuo

Mirisage
Obsidian | Level 7

Hi Haikuo,

Thank you very much for your time and expertise.

Learning resources and your annotated comments are very much appreciated.

Best regards

Mirisage

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1555 views
  • 3 likes
  • 8 in conversation