BookmarkSubscribeRSS Feed
UrvishShah
Fluorite | Level 6

Hello

I have the data like this

Data have;

input name $ age;

cards;

A 20

A 21

B 22

C 23

;

nw i wan to eliminate the duplicate obs like this

proc sort data = have nodupkey;

by name;

run;

nw is there any other way to do the same by using Proc Sql in order to reduce the processing time

9 REPLIES 9
art297
Opal | Level 21

You could always use select distint(name) as onename,age, in a proc sql call, but I doubt if it would improve the processing time.

Doc_Duke
Rhodochrosite | Level 12

Urvish,

First, it helps the rest of us if you use the topic of your question as the subject (rather than your name).

Use the DISTINCT option in SQL for this.  Documentation in the manual.  Also search the SAS papers for NODUPKEY and DISTINCT to find some good descriptions of their use.

Be aware, SQL is not always faster and may not provide the same results (you should get the same number of record, but not necesssarily the same records).

Doc Muhlbaier

Duke

Howles
Quartz | Level 8

All four rows in the example are different, so DISTINCT is going to return all four.

UrvishShah wrote:

Hello

I have the data like this

Data have;

input name $ age;

cards;

A 20

A 21

B 22

C 23

;

nw i wan to eliminate the duplicate obs like this

proc sort data = have nodupkey;

by name;

run;

nw is there any other way to do the same by using Proc Sql in order to reduce the processing time

art297
Opal | Level 21

Howard,

Not if one first selects just based on name (e.g.)

proc sql noprint;

  create table want as

    select distinct(name) as onename

      from have

        group by name

          having count(*) gt 0

;

quit;

and then joins the result back with the original table.  However, as mentioned, I can't see it outperforming proc sort.

Howles
Quartz | Level 8

Seems to me that you can omit the GROUP BY and HAVING clauses and get the same WANT; the HAVING condition can never be false. Then joining this result to the original will get you back to ... the original.

This will work as long as there are no ties:

  create table want as

    select *

      from have

        group by name

          having age EQ max(age)

;

art297 wrote:

Howard,

Not if one first selects just based on name (e.g.)

proc sql noprint;

  create table want as

    select distinct(name) as onename

      from have

        group by name

          having count(*) gt 0

;

quit;

and then joins the result back with the original table.  However, as mentioned, I can't see it outperforming proc sort.

art297
Opal | Level 21

Howard,

The following will work with or without ties, but uses the undocumented monotonic function:

Data have;

  input name $ age;

  cards;

A 20

A 20

A 21

B 22

C 23

;

proc sql noprint;

  create table want as

    select distinct(name) as onename

      from have

        group by name

          having min(monotonic()) gt 0

;

quit;

But, I still don't see it necessarily working any faster than using proc sort.

Ksharp
Super User

Art.T

I must have to say using monotonic() is dangerous for proc sql, this has been mentioned in sas documentation.

It will result undesirable result,especially for complex SQL.

Ksharp

Howles
Quartz | Level 8

I'm pretty sure that MONOTONIC always returns positive values, so the HAVING and GROUP BY clauses here don't do anything.

DISTINCT is not a function, so the parens around NAME also have no effect.

We still need the AGE column to emulate the SORT/NODUPKEY results. This

  create table want as

    select name , max(age) as age

      from have

        group by name

;

but only in the special case of a single satellite column. Were there multiple satellite columns (like AGE and SEX), no go.

art297 wrote:

Howard,

The following will work with or without ties, but uses the undocumented monotonic function:

Data have;

  input name $ age;

  cards;

A 20

A 20

A 21

B 22

C 23

;

proc sql noprint;

  create table want as

    select distinct(name) as onename

      from have

        group by name

          having min(monotonic()) gt 0

;

quit;

But, I still don't see it necessarily working any faster than using proc sort.

UrvishShah
Fluorite | Level 6

Sorry to all for displaying my name insted of the original topic

I am really sorry for that

After posted my question i learned a lot .

Thanks all .

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 956 views
  • 0 likes
  • 5 in conversation