BookmarkSubscribeRSS Feed
cosmid
Lapis Lazuli | Level 10

Hi,

I have a dataset that has a sas7bndx associated with it. I couldn't do a proc sort with nodupkey force because it will delete the sas7bndx. 

 

My questions are:

1. If a column is named ID, what's the best way to list out the duplicated IDs?

2. How do I list out the duplicated IDs with another column, example, ID and Month.

    1   Jan

    1   Mar

    3   May

    3   Jun

3. How do I remove the duplicates when there's a sas7bndx associated with it?

 

Thanks!

7 REPLIES 7
Tom
Super User Tom
Super User

What is stopping you from just re-creating the index?

 

cosmid
Lapis Lazuli | Level 10

I don't know what var(s) the original index is based on. Is there a way to find that out? And is there a way to remove the duplicates while not modifying the index file?

Reeza
Super User

Run a proc contents on the data set to see your indexes.

 

 

Patrick
Opal | Level 21

You could create an empty table structure with all the indexes and constraints based on your source table. Some small modification to the code I've shared here should do the trick.

The do with your source table as you wish. In the end append the table to your empty table structure which still got all the indexes and constraints.

 

You can of course also use approaches that change the table in-place ...but if there are a lot of duplicates (=deletes) then you'll end up with a table that contains a lot of rows with logical deletes (=which add to the file size).

Tom
Super User Tom
Super User

Try using the MODIFY statement.

You could use a HASH object to identify the duplicates.

Let's create an example dataset that has some duplicate data.

data class (index=(age));
  set sashelp.class sashelp.class;
run;

Now let's remove the duplicate observations.  In this case we can just use NAME as the set of key variables.

data class ;
  modify class;
  if _n_=1 then do;
    declare hash h();
	rc=h.definekey('name');
	rc=h.definedata('name');
	rc=h.definedone();
  end;
  if h.add() then remove;
run;

Tom_0-1652388178334.png

Tom_0-1652388274191.png

 

 

sbxkoenk
SAS Super FREQ

Hello @cosmid ,

 

Not sure whether you can actually do something with this , but posting it anyway :

 

Getting all duplicates within a SAS data set
Posted 01-29-2015 12:52 PM | by EricCai (91388 views)
https://communities.sas.com/t5/SAS-Communities-Library/Getting-all-duplicates-within-a-SAS-data-set/...

 

Thanks,

Koen

sbxkoenk
SAS Super FREQ

And ... forgive me for this reply as it is a little bit off-topic 

, but I want people reading this topic thread to know about :

memoryFormat="DVR" (duplicate value reduction memory format)

 

SAS Viya System Programming Guide
Table Action Set: Syntax
upload Action
https://go.documentation.sas.com/doc/en/pgmsascdc/v_014/caspg/cas-table-upload.htm

 

You can save a lot of ( in-memory ) space that way.

Thanks,

Koen

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1331 views
  • 2 likes
  • 5 in conversation