BookmarkSubscribeRSS Feed
podarum
Quartz | Level 8

Hi,  is there a way to remove duplicates but keep all fields.. Example below (but keep in mind there are 30 fields in total, and I want to capture all and the table is a huge data warehouse table) :

HAVE:

Acct_No       Name          Date

0001             Mike          Jun-13

0002             Mike          Jun-13

0003             Fred           Apr-13

0003             Fred           May-13

0003             Fred           Jun-13

0004             Will            Sep-13

WANT:

Acct_No       Name          Date

0001             Mike          Jun-13

0002             Mike          Jun-13

0003             Fred           Apr-13

0004             Will            Sep-13

27 REPLIES 27
esjackso
Quartz | Level 8

Does the date matter? IE your example chose the first one ... was that by design?

EJ

podarum
Quartz | Level 8

no it doesn't matter, as long as I get only one by Acct_No

and jpstats, the data is from a server, and it's large that I have to use proc sql (rsubmit)

esjackso
Quartz | Level 8

Are trying to update the table in place or just get a sas dataset to work with?

EJ

podarum
Quartz | Level 8

a dataset to work with

esjackso
Quartz | Level 8

This seems to work with your example assuming that the date is numeric:

proc sql;

  create table want as

  select acct_no, name, min(date) as date

  from have

  group by acct_no, name

  ;

quit;

EJ

jpstats
Calcite | Level 5

I may have misunderstood, but I think he wants the resulting set to have all the fields...

podarum
Quartz | Level 8

that is correct, and since theer are about 30 fields, I don't want to type each one.. a dup may occur though in another field, my example only shows date, but it could happen for any other of the 30 fields... I may have to comprimise.

esjackso
Quartz | Level 8

Even taking the min of just the date may not get rid of all the duplicates if the min is in the table multiple time for acctno and name and other fields changing... if the values dont really matter then proc sort with out= option is the best choice as mentioned in his post.

EJ

jpstats
Calcite | Level 5

Just use -

proc sort

     data = mydataset nodupkey;

     by acct_no name;

run;

Or, if you have to use SQL -

proc sql noprint;

     create table key_keep as

     select distinct acct_no, name, date

     from mydataset

     group by acct_no, name

     having date = min(date);

     create table mydataset2 as

     select a.*

     from mydataset a

     INNER JOIN

     key_keep b

     ON

     a.acct_no = b.acct_no and

     a.name = b.name and

     a.date = b.date;

quit;

I think that should do the trick.

esjackso
Quartz | Level 8

That doesnt actually get what they ask for a couple of reasons:

- doesnt use SQL

- it really is only a no dup on acct_no since the dates are actually unique

jpstats
Calcite | Level 5

Right, good catch.

PGStats
Opal | Level 21

With SQL you will have somehow to tell the procedure which value of your fields to keep. In your example, it seems that you want to keep the minimum Date. If your choice of value to keep depends on the order of the data in the table (e.g. the first Date), then SQL is not a good tool to perform the operation and proc SORT or a datastep would do a better job.

PG

PG
cooker
Calcite | Level 5

You may try the following:

[insert your passthru code here]

proc sql;

create table newtable as

select a.acct_no, a.name, a.date

from dataset a

where a.date = (select min(a1.date)

    from dataset a1

    where a1.name = a.name

    and a1.acct_no = a.acct_no)

quit;

You'll  want to list all of your fields of course or use the a.* method. There are other methods which are available to Oracle DB but this should work on most platforms.

Good luck!

cooker
Calcite | Level 5

Yes, if you could have differences in all the columns; you'd have to replicate the subquery for each column. This may be something which knocks the snot out of your DB server and may timeout.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 27 replies
  • 1534 views
  • 6 likes
  • 7 in conversation