BookmarkSubscribeRSS Feed
jpstats
Calcite | Level 5

Copy & paste the following:

data mydataset;

  input acct_no name $ date date9.;

datalines;

0001 Mike 01JUN2013

0002 Mike 01JUN2013

0003 Fred 01APR2013

0003 Fred 01MAY2013

0003 Fred 01JUN2013

0004 Will 01SEP2013

run;

proc sql noprint;

     create table key_keep as

     select distinct acct_no, name, date

     from mydataset

     group by acct_no, name

     having date = min(date);

     create table mydataset2 as

     select a.*

     from mydataset a

     INNER JOIN

     key_keep b

     ON

     a.acct_no = b.acct_no and

     a.name = b.name and

     a.date = b.date;

quit;

The output (mydataset2) should match your specification.

Haikuo
Onyx | Level 15

Well, since nobody has had mentioned, and it seems to work (Proc SQL only, and regardless of the quantity of variables), however, it will have to involve an undocumented feature, so not recommend for production usage.

proc sql;

  create table no_dup as

    select * from have

   group by acct_no, name

     having monotonic()=min(monotonic());

  quit;

Haikuo

esjackso
Quartz | Level 8

Very interesting ... I will have to file this trick away!

EJ

esjackso
Quartz | Level 8

So I tried the trick out and must be doing something wrong.

I was expecting the same results from a proc sort nodupkey process but didnt get that. What am I missing? The made up data is attached

LOG:

15  proc sql;

16      create table no_dup as

17      select *

18      from duptest

19      group by name, date

20      having monotonic()=min(monotonic());

NOTE: The query requires remerging summary statistics back with the

      original data.

NOTE: Table WORK.NO_DUP created, with 1 rows and 5 columns.

21  quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time          0.01 seconds

      cpu time            0.01 seconds

22  proc sort data=duptest out=out nodupkeys; by name date; run;

NOTE: There were 12 observations read from the data set WORK.DUPTEST.

NOTE: 1 observations with duplicate key values were deleted.

NOTE: The data set WORK.OUT has 11 observations and 5 variables.

NOTE: PROCEDURE SORT used (Total process time):

      real time          0.00 seconds

      cpu time            0.00 seconds

Tom
Super User Tom
Super User

It is probably better to use the MONOTONIC() function in a subquery and assign it to a variable rather than try to call the function in two different places in the same SQL code.

create table no_dup (drop=_n_) as

   select *

   from  (select *,monotonic() as _n_ from duptest)

   group by name,date

   having _n_ = min(_n_)

   order by name,date

;

jpstats
Calcite | Level 5

This is better, though the subquery part doesn't appear to be necessary.  I think with your example the result of the monotonic() function being executed prior to the having clause is that the group by statement is able to kick in and take effect with the summary function.  When it's part of the having clause it appears to evaluate across all observations without any grouping.

esjackso
Quartz | Level 8

Thanks that does seem to work for my example. My goal was to understand this method see it was new to me. Something to file away for future use!

23   proc sql;

24      create table no_dup (drop=_n_) as

25      select *

26      from  (select *,monotonic() as _n_ from duptest)

27      group by name,date

28      having _n_ = min(_n_)

29      order by name,date

30   ;

NOTE: The query requires remerging summary statistics back with the

      original data.

NOTE: Table WORK.NO_DUP created, with 11 rows and 5 columns.

31   quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.02 seconds

      cpu time            0.04 seconds

32

33   proc sort data=duptest out=out nodupkeys; by name date; run;

NOTE: There were 12 observations read from the data set WORK.DUPTEST.

NOTE: 1 observations with duplicate key values were deleted.

NOTE: The data set WORK.OUT has 11 observations and 5 variables.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

Haikuo
Onyx | Level 15

Strongly agree with Tom, and Thanks to Eric for pointing this out. I only did one test run on the data presented, and it seems to work, I guess I just got lucky:

data mydataset;

  input acct_no name $ date date9.;

datalines;

0001 Mike 01JUN2013

0002 Mike 01JUN2013

0003 Fred 01APR2013

0003 Fred 01MAY2013

0003 Fred 01JUN2013

0004 Will 01SEP2013

run;

proc sql;

  create table no_dup as

  select *

  from mydataset

  group by acct_no, name

  having monotonic()=min(monotonic());

quit;

Haikuo

cooker
Calcite | Level 5

These are great ideas; but if he's using a passthru query, will any these SAS functions work?

Haikuo
Onyx | Level 15

Not good for pass-through. Where did OP mention the target is RDBMS? The thread is getting long, I must have missed it.

cooker
Calcite | Level 5

3.Re: remove duplicates but keep all fields (using proc sql)

podarumApprentice

Currently Being Moderated               

no it doesn't matter, as long as I get only one by Acct_No

and jpstats, the data is from a server, and it's large that I have to use proc sql (rsubmit)

esjackso
Quartz | Level 8

Its been awhile since I used rsubmit but isnt that only for remote submits to SAS on a server? Which wouldnt be the same as a pass thru query to a different SQL platform.

Eric

Haikuo
Onyx | Level 15

Yes, Eric. I use "rsubmit" on a daily basis, mostly to a server, once for while to some PCs which have SAS running (at least some sort of spawner is active, connect spawner, job spawner or object spawner, not sure). I believe "rsubmit" is for SAS/CONNECT.

Haikuo

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 27 replies
  • 1796 views
  • 6 likes
  • 7 in conversation