How to get one subset like this?

TomiKong · Posted 07-06-2012 07:44 AM

My dataset is shown as below,

case_number	case_value
a	1
a	2
a	3
a	4
b	2
b	3
b	4
b	5
c	5
c	6
c	7
c	9

I want to extract the observations whose case _value is maximal across each case_number. For this dataset, I want to get one subset like this.

case_number	case_value
a	4
b	5
c	9

How to achieve this by coding?

Thanks!

Alpay · Posted 07-06-2012 07:57 AM

data have;

input case_number $ case_value;

datalines;

a 1

a 2

a 3

a 4

b 2

b 3

b 4

b 5

c 5

c 6

c 7

c 9

;

run;

proc sql;

create table want as

select distinct case_number,

max(case_value) as case_value

from have

group by case_number;

quit;

PGStats · Posted 07-06-2012 08:02 AM

Why use DISTINCT in this query?

PG

Alpay · Posted 07-06-2012 08:35 AM

Thanks for the question since it made me think what's happening with proc sql calls.

It will return the same result set without DISTINCT in select statement with one less operation.

I looked at the method chosen by proc sql with and without DISTINCT keyword in select statement.

When DISTINCT is used "sqxunqs" operation is run as an extra operation to de-dup records which are already unique before this operation.

I should be more careful next time when using DISTINCT keyword especially with large data sets.

Method without DISTINCT:

proc sql _method;

create table want as

select case_number,

max(case_value) as case_value

from have

group by case_number;

quit;

NOTE: SQL execution methods chosen are:

sqxcrta

sqxsumg

sqxsort

sqxsrc( WORK.HAVE )

Method with DISTINCT:

proc sql _method;

create table want as

select distinct case_number,

max(case_value) as case_value

from have

group by case_number;

quit;

NOTE: SQL execution methods chosen are:

sqxcrta

sqxunqs

sqxsumg

sqxsort

sqxsrc( WORK.HAVE )

Keith · Posted 07-06-2012 08:29 AM

Another option is Proc Summary. The 2nd example is based on the maximum value being returned when an ID statement is used.

proc summary data=have nway;

class case_number;

var case_value;

output out=want (drop=_:) max=;

run;

or

proc summary data=have nway;

class case_number;

id case_value;

output out=want (drop=_:);

run;

Linlin · Posted 07-06-2012 08:38 AM

or no SQL solution:

data have;

input case_number $ case_value;

datalines;

a 1

a 2

a 3

a 4

b 2

b 3

b 4

b 5

c 5

c 6

c 7

c 9

;

proc sort;

by case_number $ case_value;

data want;

set have;

by case_number;

if last.case_number;

proc print;run;

Linlin

MikeZdeb · Posted 07-06-2012 10:02 AM

hi ... the SQL and SUMMARY solutions seem most "economical", but if you are going to sort, you can always sort again and take advantage of the default EQUALS option that maintains order withing sort groups ...

proc sort data=have out=want;

by case_number descending case_value;

run;

proc sort data=want nodupkey;

by case_number;

run;

Linlin · Posted 07-06-2012 11:28 AM

Hi Mike,

Thank you! I need to practice my Mapping skill learned from you. I am going to post a question at SAS/Graph and ODS Graphics. I am sure you will help me.

- Linlin

Haikuo · Posted 07-06-2012 10:21 AM

FWIW, a hash() solution:

data have;

input case_number$ case_value;

cards;

a 1

a 2

a 3

a 4

b 2

b 3

b 4

b 5

c 5

c 6

c 7

c 9

;

data _null_;

if 0 then set have;

dcl hash h(dataset:'have', ordered: 'a',duplicate:'r');

h.definekey('case_number');

h.definedata(all:'y');

h.definedone();

rc=h.output(dataset:'want');

stop;

run;

proc print;run;

Haikuo

Mirisage · Posted 07-06-2012 02:43 PM

Hi Haikuo,

I am totally new to hash. Hash looks like a vey handy appraoch.

Coud you please elaborate a bit how you have built the logic in your hash code (time permitting).

This simple data set looks a good opportunity to understand it.

Thanks

Mirisage

Haikuo · Posted 07-06-2012 03:28 PM

Hi Mirisage,

Hash() object is actually more straightforward than many other data step statements, such as 'set', 'merge', 'update', it is just a little different. And you are correct, Above example is ideal to take a peek into hash():

data _null_;

if 0 then set have; /*This is to set hash() environment, eg variables, before you can define a hash(), in this case, you can also use: call missing (case_number, case_value);*/

dcl hash h(dataset:'have', ordered: 'a',duplicate:'r'); /*This is to define hash(), named 'h', loaded from 'have', with an order 'a' as 'ascending', if there are duplicates by 'key', then only keep the last record ('r' denotes replace)*/

h.definekey('case_number'); /*Define search key 'case_number'*/

h.definedata(all:'y'); /*Define data element, all :'y' denoting with every thing from 'have'

h.definedone(); /*the end of hash() definition*/

rc=h.output(dataset:'want'); /output hash 'h' to dataset 'want'*/

stop;

run;

Here is the help link, where I 've learned most of my hash():

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002576871.htm

Also there are tons of sugi paper on hash(), pay extra attention on those written by Paul dorfman.

Regards,

Haikuo

Mirisage · Posted 07-06-2012 09:22 PM

Hi Haikuo,

Thank you very much for your time and expertise.

Learning resources and your annotated comments are very much appreciated.

Best regards

Mirisage

How to get one subset like this?

Re: How to get one subset like this?

Re: How to get one subset like this?

Re: How to get one subset like this?

Re: How to get one subset like this?

Re: How to get one subset like this?

Re: How to get one subset like this?

Re: How to get one subset like this?

Re: How to get one subset like this?

Re: How to get one subset like this?

Re: How to get one subset like this?

Re: How to get one subset like this?

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away