BookmarkSubscribeRSS Feed
Saikiran_Mamidi
Obsidian | Level 7

Data have;

input id;

cards;

1

1

1

2

2

44

44

44

44

; run;

 

duplicate records in output dataset by using only proc sql?

like below

data want

id

1

1

2

44

44

44

can i get code for this ? thanks in advance.

9 REPLIES 9
Saikiran_Mamidi
Obsidian | Level 7

Data have;

input id;

cards;

1

1

1

2

2

44

44

44

44

; run;

 

duplicate records in output dataset by using only proc sql?

like below

data want

id

1

1

2

44

44

44

can i get code for this ? thanks in advance.

LinusH
Tourmaline | Level 20

Why limit yourself to SQL?

What have you tried so far?

Data never sleeps
yabwon
Onyx | Level 15

Something like this:

Data have;
input id;
cards;
1
1
1
2
2
44
44
44
44
; run;


proc sql;
  create table tmp as
  select id, monotonic() as i from have
  order by id
  ;

  select a.* 
  from
  tmp as a
  group by a.id
  having a.i > min(a.i)
  ;

  delete table tmp;
quit;

Not the very optimal I'd say.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15
You still have duplicates in Want, haven't you?

Bart
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



mkeintz
PROC Star

What if, in addition to variable ID, you have other "satellite" variables A, B, and C, which vary within a given ID?  What criterion would you use to select which records are the extra ones?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
DavePrinsloo
Pyrite | Level 9

Well, if have has 600 million records but only 1 million unique key value, you would would want to avoid sorting the 600 million records.

If that is the scenario, you could read WANT sequentially only once, use a datastep with hashing to look up whether you have a match.  

If so, insert the first match into the hash table (of course the other non-key  saved variables might not be want you want).

At then at the end you output the hash table to a physical table.

 

You need to ensure that the million full rows will fit into memory.     If that is not possible then

create a macro that read HAVE in n pieces using options firstobs and obs.   

Then you need a final part that re-reads the reduced pieces and consolidates them into a final HAVE.

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1827 views
  • 2 likes
  • 6 in conversation