turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- How to get one subset like this?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 07:44 AM

My dataset is shown as below,

case_number | case_value |

a | 1 |

a | 2 |

a | 3 |

a | 4 |

b | 2 |

b | 3 |

b | 4 |

b | 5 |

c | 5 |

c | 6 |

c | 7 |

c | 9 |

I want to extract the observations whose case _value is maximal across each case_number. For this dataset, I want to get one subset like this.

case_number | case_value |

a | 4 |

b | 5 |

c | 9 |

How to achieve this by coding?

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 07:57 AM

data have;

input case_number $ case_value;

datalines;

a 1

a 2

a 3

a 4

b 2

b 3

b 4

b 5

c 5

c 6

c 7

c 9

;

run;

proc sql;

create table want as

select distinct case_number,

max(case_value) as case_value

from have

group by case_number;

quit;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 08:02 AM

Why use DISTINCT in this query?

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 08:35 AM

Thanks for the question since it made me think what's happening with proc sql calls.

It will return the same result set without DISTINCT in select statement with one less operation.

I looked at the method chosen by proc sql with and without DISTINCT keyword in select statement.

When DISTINCT is used "sqxunqs" operation is run as an extra operation to de-dup records which are already unique before this operation.

I should be more careful next time when using DISTINCT keyword especially with large data sets.

Method without DISTINCT:

proc sql _method;

create table want as

select case_number,

max(case_value) as case_value

from have

group by case_number;

quit;

NOTE: SQL execution methods chosen are:

sqxcrta

sqxsumg

sqxsort

sqxsrc( WORK.HAVE )

Method with DISTINCT:

proc sql _method;

create table want as

select distinct case_number,

max(case_value) as case_value

from have

group by case_number;

quit;

NOTE: SQL execution methods chosen are:

sqxcrta

sqxunqs

sqxsumg

sqxsort

sqxsrc( WORK.HAVE )

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 08:29 AM

Another option is Proc Summary. The 2nd example is based on the maximum value being returned when an ID statement is used.

**proc** **summary** data=have nway;

class case_number;

var case_value;

output out=want (drop=_ max=;

**run**;

or

**proc** **summary** data=have nway;

class case_number;

id case_value;

output out=want (drop=_;

**run**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 08:38 AM

or no SQL solution:

data have;

input case_number $ case_value;

datalines;

a 1

a 2

a 3

a 4

b 2

b 3

b 4

b 5

c 5

c 6

c 7

c 9

;

proc sort;

by case_number $ case_value;

data want;

set have;

by case_number;

if last.case_number;

proc print;run;

Linlin

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 10:02 AM

hi ... the SQL and SUMMARY solutions seem most "economical", but if you are going to sort, you can always sort again and take advantage of the default EQUALS option that maintains order withing sort groups ...

**proc sort data=have out=want;**

**by case_number descending case_value;**

**run;**

**proc sort data=want nodupkey;**

**by case_number;**

**run;**

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 11:28 AM

Hi Mike,

Thank you! I need to practice my Mapping skill learned from you. I am going to post a question at SAS/Graph and ODS Graphics. I am sure you will help me.

- Linlin

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 10:21 AM

FWIW, a hash() solution:

data have;

input case_number$ case_value;

cards;

a 1

a 2

a 3

a 4

b 2

b 3

b 4

b 5

c 5

c 6

c 7

c 9

;

data _null_;

if 0 then set have;

dcl hash h(dataset:'have', ordered: 'a',duplicate:'r');

h.definekey('case_number');

h.definedata(all:'y');

h.definedone();

rc=h.output(dataset:'want');

stop;

run;

proc print;run;

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 02:43 PM

Hi Haikuo,

I am totally new to hash. Hash looks like a vey handy appraoch.

Coud you please elaborate a bit how you have built the logic in your hash code (time permitting).

This simple data set looks a good opportunity to understand it.

Thanks

Mirisage

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 03:28 PM

Hi Mirisage,

Hash() object is actually more straightforward than many other data step statements, such as 'set', 'merge', 'update', it is just a little different. And you are correct, Above example is ideal to take a peek into hash():

data _null_;

if 0 then set have; /*This is to set hash() environment, eg variables, before you can define a hash(), in this case, you can also use: call missing (case_number, case_value);*/

dcl hash h(dataset:'have', ordered: 'a',duplicate:'r'); /*This is to define hash(), named 'h', loaded from 'have', with an order 'a' as 'ascending', if there are duplicates by 'key', then only keep the last record ('r' denotes replace)*/

h.definekey('case_number'); /*Define search key 'case_number'*/

h.definedata(all:'y'); /*Define data element, all :'y' denoting with every thing from 'have'

h.definedone(); /*the end of hash() definition*/

rc=h.output(dataset:'want'); /output hash 'h' to dataset 'want'*/

stop;

run;

Here is the help link, where I 've learned most of my hash():

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002576871.htm

Also there are tons of sugi paper on hash(), pay extra attention on those written by Paul dorfman.

Regards,

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2012 09:22 PM

Hi Haikuo,

Thank you very much for your time and expertise.

Learning resources and your annotated comments are very much appreciated.

Best regards

Mirisage