BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
arde
Obsidian | Level 7

Hi.

 

Is there a way to delete duplicates from a row:

 

Have

row var1 var2 var3
1 a b b
2 c a c
3 a b c

 

 

Want

row var1 var2 var3
1 a b  
2 c a  
3 a b c

 

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

The complexity of the program will very depending on how many fields you are comparing, and whether their names really are var1, var2, var3, etc.

 

For the problem as posted, you could use something simple:

 

data want;
   set have;
   if var3 = var2 or var3 = var1 then var3 = ' ';
   if var2 = var1 then var2 = ' ';
run;

I suspect that this will not do in real life, and that the actual problem is a bit more complex.  So describe more of what you are really facing.  How many variables?  What are the variable names?  Are they all character variables?

View solution in original post

7 REPLIES 7
ballardw
Super User

Does actual order of the values in the variables matter? Meaning could the result for the second row look like:

 

2 a c  

 

And in your real data are the values other than single characters? There are approaches that may work for single characters that wouldn't for multi-word strings. Which raises the question of does letter case make a difference? Is

'A' a duplicate for 'a'  or is "Apple" a duplicate for "apple"?

arde
Obsidian | Level 7
actual order doesn't matter. real data will be more than a single character. Case will be the same.

Thank you!
Astounding
PROC Star

The complexity of the program will very depending on how many fields you are comparing, and whether their names really are var1, var2, var3, etc.

 

For the problem as posted, you could use something simple:

 

data want;
   set have;
   if var3 = var2 or var3 = var1 then var3 = ' ';
   if var2 = var1 then var2 = ' ';
run;

I suspect that this will not do in real life, and that the actual problem is a bit more complex.  So describe more of what you are really facing.  How many variables?  What are the variable names?  Are they all character variables?

arde
Obsidian | Level 7

I'm using catx to add characters together separated by commas.  But some of the columns have duplicated words.  And I only need one of those words in the final catx variable.

mkeintz
PROC Star

@arde wrote:

I'm using catx to add characters together separated by commas.  But some of the columns have duplicated words.  And I only need one of those words in the final catx variable.


What if the duplicate is in the middle of the series of variables?  I.e, if your values are

      A  A   B

then if you set the duplicate middle var to blank, and subsequently use CATX('!',var1,var2,var3), you will get two consecutive CATX delimiters (two exclamation marks in this example) in the middle.  Is that what you want?

 

If not, then you could use CATX iteratively, concatenating only the non-duplicates (using '!' as the delimiter below):

 

data want (drop=i);
  set have;
  array chr {*} a b c ;
  length new_text $60;
  new_text=left(chr{1});
  do i=2 to dim(chr);
    if findw(trim(new_text),trim(chr{i}),'!')=0 then new_text=catx('!',new_text,chr{i});
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

@arde wrote:

I'm using catx to add characters together separated by commas.  But some of the columns have duplicated words.  And I only need one of those words in the final catx variable.


In that case there is no need to remove the duplicates.  Just don't add them to the list.

data want;
  set have;
  length string $200;
  array x var1-var4 ;
  do index=1 to dim(x);
    if not indexw(string,x[index],'|') then 
      string=catx('|',string,x[index])
    ;
  end;
  drop index;
run;
arde
Obsidian | Level 7
thank you!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2283 views
  • 0 likes
  • 5 in conversation