SAS Programming

suraestar · Posted 04-24-2013 08:16 AM

Dear All,

I need the NODUPKEY functionality in PROC SQL Similar to PROC SORT.

The below code give you unique values of MAKE from the CARS dataset with ALL VARIABLES.
proc sort data=sashelp.cars out=cars nodupkey;
by make;
run;

whereas in PROC SQL the distinct function will give you only the MAKE variable in the output dataset.
proc sql;
create table cars as
select distinct make from sashelp.cars;
quit;

Is it possible to get UNIQUE values + All other variables in PROC SQL.

Note: We can use the monotonic function to get the above thing. but it is not the efficient way.

Could you please share your ideas.

thanks
Surendar

Patrick · Posted 04-24-2013 08:40 AM

I would have to ask what's the value of having additional variables which are just picked from one of the rows with multiple keys. What are you going to do with it?

To get a similar result using Proc SQL your approach using monotonic() appears to be to most efficient way from the amount of code you have to write.

proc sql;

create table test(drop=n) as

select *, monotonic() as n

from sashelp.cars

group by make

having max(n) = n

;

quit;

Else you would have to use the max() or min() function on every variable:

proc sql;

create table test as

select make, max(model) as model, max(.....

from sashelp.cars

group by make

;

quit;

Ksharp · Posted 04-24-2013 08:43 AM

You need make a id variable on your own.

ods listing close;
ods output sql_results=temp;
proc sql number;
select * from sashelp.cars;
quit;
proc sql;
create table want as
 select * from temp
  group by make
   having row=min(row);
quit;
ods listing;

Ksharp

suraestar · Posted 04-24-2013 09:01 AM

Hi Patrick,

My point is that I am filtering the dataset which has no duplicate records based on one or more key variable in it . Ofcourse I need the other variables too in my target dataset.This can be done in PROC SORT - NODUPKEY statement..

I need a similar to statement in proc sql to do the job..

BTW if we use the monotonic funcation along with MAX.. It takes more time than PROC SORT and also we will end up with following note

NOTE: The query requires remerging summary statistics back with the original data

Haikuo · Posted 04-24-2013 09:14 AM

If efficiency is your concern, consider using Hash() objects.

Haikuo

suraestar · Posted 04-24-2013 11:28 AM

Hi. Yes you are right I was looking for the efficiency. But I m not aware of hash concepts. Can you please guide me about hash.

Haikuo · Posted 04-24-2013 12:03 PM

Play with this example code to see how much you can get from it, it basically output two records based on key variable 'sex':

data _null_;

if 0 then set sashelp.class;

declare hash h(dataset: 'sashelp.class');

h.definekey('sex');

h.definedata(all:'yes');

h.definedone();

h.output(dataset:'want');

stop;

run;

Haikuo

Tom · Posted 04-24-2013 09:54 AM

If PROC SORT NODUPKEY does the job why do you need to do it in PROC SQL?

Sounds kind of like requesting how to get mean values from PROC FREQ instead of using PROC MEANS.

suraestar · Posted 04-24-2013 10:45 AM

HI All, My idea is about finding the best approach and also finding the alternate approach which is efficient. This post will be helpful and interesting for people who thinks the either way to do the task. I had this topic in mind bcoz If I (may be we) have a task to delete duplicate records in a dataset. My mind directly chooses proc sort with nodupkey. But I like to think or I want the alternate solution to do This. May be I m wrong but this can help me in exploring more. Thank you all for your comments and ideas.

Patrick · Posted 04-24-2013 07:57 PM

You are NOT removing duplicate records using Proc Sort / Nodupkey unless you add all variables to the 'by' statement - and then it does the same like a SQL Distinct.

You are removing duplicate 'keys' using Nodupkey - all the other variables not in the by group are then just coming from one of the multiple records with the same key values. You have zero control which row will be selected so what's the value of having these additional variables?

There have already been many discussions in this forum around de-duplicating records.

SAS Programming

NODUPKEY feature in PROC SQL?

Re: NODUPKEY feature in PROC SQL?

Re: NODUPKEY feature in PROC SQL?

Re: NODUPKEY feature in PROC SQL?

Re: NODUPKEY feature in PROC SQL?

Re: NODUPKEY feature in PROC SQL?

Re: NODUPKEY feature in PROC SQL?

Re: NODUPKEY feature in PROC SQL?

Re: NODUPKEY feature in PROC SQL?

Re: NODUPKEY feature in PROC SQL?

PROC SQL - Insert

PROC SQL- Order by

Performance of sort nodupkey vs sql select distinct

Proc sort nodupkey

How to use proc sort nodupkey??

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...