BookmarkSubscribeRSS Feed
Saikiran_Mamidi
Obsidian | Level 7

Data have;

input id;

cards;

1

1

1

2

2

44

44

44

44

; run;

 

duplicate records in output dataset by using only proc sql?

like below

data want

id

1

1

2

44

44

44

can i get code for this ? thanks in advance.

9 REPLIES 9
Saikiran_Mamidi
Obsidian | Level 7

Data have;

input id;

cards;

1

1

1

2

2

44

44

44

44

; run;

 

duplicate records in output dataset by using only proc sql?

like below

data want

id

1

1

2

44

44

44

can i get code for this ? thanks in advance.

LinusH
Tourmaline | Level 20

Why limit yourself to SQL?

What have you tried so far?

Data never sleeps
yabwon
Onyx | Level 15

Something like this:

Data have;
input id;
cards;
1
1
1
2
2
44
44
44
44
; run;


proc sql;
  create table tmp as
  select id, monotonic() as i from have
  order by id
  ;

  select a.* 
  from
  tmp as a
  group by a.id
  having a.i > min(a.i)
  ;

  delete table tmp;
quit;

Not the very optimal I'd say.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15
You still have duplicates in Want, haven't you?

Bart
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



mkeintz
PROC Star

What if, in addition to variable ID, you have other "satellite" variables A, B, and C, which vary within a given ID?  What criterion would you use to select which records are the extra ones?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
DavePrinsloo
Pyrite | Level 9

Well, if have has 600 million records but only 1 million unique key value, you would would want to avoid sorting the 600 million records.

If that is the scenario, you could read WANT sequentially only once, use a datastep with hashing to look up whether you have a match.  

If so, insert the first match into the hash table (of course the other non-key  saved variables might not be want you want).

At then at the end you output the hash table to a physical table.

 

You need to ensure that the million full rows will fit into memory.     If that is not possible then

create a macro that read HAVE in n pieces using options firstobs and obs.   

Then you need a final part that re-reads the reduced pieces and consolidates them into a final HAVE.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1591 views
  • 2 likes
  • 6 in conversation