BookmarkSubscribeRSS Feed
cosmid
Lapis Lazuli | Level 10

Hi,

I have a dataset that has a sas7bndx associated with it. I couldn't do a proc sort with nodupkey force because it will delete the sas7bndx. 

 

My questions are:

1. If a column is named ID, what's the best way to list out the duplicated IDs?

2. How do I list out the duplicated IDs with another column, example, ID and Month.

    1   Jan

    1   Mar

    3   May

    3   Jun

3. How do I remove the duplicates when there's a sas7bndx associated with it?

 

Thanks!

7 REPLIES 7
Tom
Super User Tom
Super User

What is stopping you from just re-creating the index?

 

cosmid
Lapis Lazuli | Level 10

I don't know what var(s) the original index is based on. Is there a way to find that out? And is there a way to remove the duplicates while not modifying the index file?

Reeza
Super User

Run a proc contents on the data set to see your indexes.

 

 

Patrick
Opal | Level 21

You could create an empty table structure with all the indexes and constraints based on your source table. Some small modification to the code I've shared here should do the trick.

The do with your source table as you wish. In the end append the table to your empty table structure which still got all the indexes and constraints.

 

You can of course also use approaches that change the table in-place ...but if there are a lot of duplicates (=deletes) then you'll end up with a table that contains a lot of rows with logical deletes (=which add to the file size).

Tom
Super User Tom
Super User

Try using the MODIFY statement.

You could use a HASH object to identify the duplicates.

Let's create an example dataset that has some duplicate data.

data class (index=(age));
  set sashelp.class sashelp.class;
run;

Now let's remove the duplicate observations.  In this case we can just use NAME as the set of key variables.

data class ;
  modify class;
  if _n_=1 then do;
    declare hash h();
	rc=h.definekey('name');
	rc=h.definedata('name');
	rc=h.definedone();
  end;
  if h.add() then remove;
run;

Tom_0-1652388178334.png

Tom_0-1652388274191.png

 

 

sbxkoenk
SAS Super FREQ

Hello @cosmid ,

 

Not sure whether you can actually do something with this , but posting it anyway :

 

Getting all duplicates within a SAS data set
Posted 01-29-2015 12:52 PM | by EricCai (91388 views)
https://communities.sas.com/t5/SAS-Communities-Library/Getting-all-duplicates-within-a-SAS-data-set/...

 

Thanks,

Koen

sbxkoenk
SAS Super FREQ

And ... forgive me for this reply as it is a little bit off-topic 

, but I want people reading this topic thread to know about :

memoryFormat="DVR" (duplicate value reduction memory format)

 

SAS Viya System Programming Guide
Table Action Set: Syntax
upload Action
https://go.documentation.sas.com/doc/en/pgmsascdc/v_014/caspg/cas-table-upload.htm

 

You can save a lot of ( in-memory ) space that way.

Thanks,

Koen

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1217 views
  • 2 likes
  • 5 in conversation