BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
arde
Obsidian | Level 7

Hi.

 

Is there a way to delete duplicates from a row:

 

Have

row var1 var2 var3
1 a b b
2 c a c
3 a b c

 

 

Want

row var1 var2 var3
1 a b  
2 c a  
3 a b c

 

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
Opal | Level 21

The complexity of the program will very depending on how many fields you are comparing, and whether their names really are var1, var2, var3, etc.

 

For the problem as posted, you could use something simple:

 

data want;
   set have;
   if var3 = var2 or var3 = var1 then var3 = ' ';
   if var2 = var1 then var2 = ' ';
run;

I suspect that this will not do in real life, and that the actual problem is a bit more complex.  So describe more of what you are really facing.  How many variables?  What are the variable names?  Are they all character variables?

View solution in original post

7 REPLIES 7
ballardw
Super User

Does actual order of the values in the variables matter? Meaning could the result for the second row look like:

 

2 a c  

 

And in your real data are the values other than single characters? There are approaches that may work for single characters that wouldn't for multi-word strings. Which raises the question of does letter case make a difference? Is

'A' a duplicate for 'a'  or is "Apple" a duplicate for "apple"?

arde
Obsidian | Level 7
actual order doesn't matter. real data will be more than a single character. Case will be the same.

Thank you!
Astounding
Opal | Level 21

The complexity of the program will very depending on how many fields you are comparing, and whether their names really are var1, var2, var3, etc.

 

For the problem as posted, you could use something simple:

 

data want;
   set have;
   if var3 = var2 or var3 = var1 then var3 = ' ';
   if var2 = var1 then var2 = ' ';
run;

I suspect that this will not do in real life, and that the actual problem is a bit more complex.  So describe more of what you are really facing.  How many variables?  What are the variable names?  Are they all character variables?

arde
Obsidian | Level 7

I'm using catx to add characters together separated by commas.  But some of the columns have duplicated words.  And I only need one of those words in the final catx variable.

mkeintz
Jade | Level 19

@arde wrote:

I'm using catx to add characters together separated by commas.  But some of the columns have duplicated words.  And I only need one of those words in the final catx variable.


What if the duplicate is in the middle of the series of variables?  I.e, if your values are

      A  A   B

then if you set the duplicate middle var to blank, and subsequently use CATX('!',var1,var2,var3), you will get two consecutive CATX delimiters (two exclamation marks in this example) in the middle.  Is that what you want?

 

If not, then you could use CATX iteratively, concatenating only the non-duplicates (using '!' as the delimiter below):

 

data want (drop=i);
  set have;
  array chr {*} a b c ;
  length new_text $60;
  new_text=left(chr{1});
  do i=2 to dim(chr);
    if findw(trim(new_text),trim(chr{i}),'!')=0 then new_text=catx('!',new_text,chr{i});
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

@arde wrote:

I'm using catx to add characters together separated by commas.  But some of the columns have duplicated words.  And I only need one of those words in the final catx variable.


In that case there is no need to remove the duplicates.  Just don't add them to the list.

data want;
  set have;
  length string $200;
  array x var1-var4 ;
  do index=1 to dim(x);
    if not indexw(string,x[index],'|') then 
      string=catx('|',string,x[index])
    ;
  end;
  drop index;
run;
arde
Obsidian | Level 7
thank you!

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 287 views
  • 0 likes
  • 5 in conversation