Solved: How to remove duplicate values in a row

arde · Posted 08-04-2023 06:16 PM

Hi.

Is there a way to delete duplicates from a row:

Have

row	var1	var2	var3
1	a	b	b
2	c	a	c
3	a	b	c

Want

row	var1	var2	var3
1	a	b
2	c	a
3	a	b	c

Thank you

Astounding · Posted 08-04-2023 07:37 PM

The complexity of the program will very depending on how many fields you are comparing, and whether their names really are var1, var2, var3, etc.

For the problem as posted, you could use something simple:

data want;
   set have;
   if var3 = var2 or var3 = var1 then var3 = ' ';
   if var2 = var1 then var2 = ' ';
run;

I suspect that this will not do in real life, and that the actual problem is a bit more complex. So describe more of what you are really facing. How many variables? What are the variable names? Are they all character variables?

View solution in original post

ballardw · Posted 08-04-2023 06:26 PM

Does actual order of the values in the variables matter? Meaning could the result for the second row look like:

2

a

c

And in your real data are the values other than single characters? There are approaches that may work for single characters that wouldn't for multi-word strings. Which raises the question of does letter case make a difference? Is

'A' a duplicate for 'a' or is "Apple" a duplicate for "apple"?

arde · Posted 08-04-2023 10:43 PM

actual order doesn't matter. real data will be more than a single character. Case will be the same.

Thank you!

Astounding · Posted 08-04-2023 07:37 PM

The complexity of the program will very depending on how many fields you are comparing, and whether their names really are var1, var2, var3, etc.

For the problem as posted, you could use something simple:

data want;
   set have;
   if var3 = var2 or var3 = var1 then var3 = ' ';
   if var2 = var1 then var2 = ' ';
run;

I suspect that this will not do in real life, and that the actual problem is a bit more complex. So describe more of what you are really facing. How many variables? What are the variable names? Are they all character variables?

arde · Posted 08-04-2023 10:45 PM

I'm using catx to add characters together separated by commas. But some of the columns have duplicated words. And I only need one of those words in the final catx variable.

mkeintz · Posted 08-04-2023 11:13 PM

@arde wrote:

I'm using catx to add characters together separated by commas. But some of the columns have duplicated words. And I only need one of those words in the final catx variable.

What if the duplicate is in the middle of the series of variables? I.e, if your values are

A A B

then if you set the duplicate middle var to blank, and subsequently use CATX('!',var1,var2,var3), you will get two consecutive CATX delimiters (two exclamation marks in this example) in the middle. Is that what you want?

If not, then you could use CATX iteratively, concatenating only the non-duplicates (using '!' as the delimiter below):

data want (drop=i);
  set have;
  array chr {*} a b c ;
  length new_text $60;
  new_text=left(chr{1});
  do i=2 to dim(chr);
    if findw(trim(new_text),trim(chr{i}),'!')=0 then new_text=catx('!',new_text,chr{i});
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Tom · Posted 08-04-2023 11:14 PM

@arde wrote:

I'm using catx to add characters together separated by commas. But some of the columns have duplicated words. And I only need one of those words in the final catx variable.

In that case there is no need to remove the duplicates. Just don't add them to the list.

data want;
  set have;
  length string $200;
  array x var1-var4 ;
  do index=1 to dim(x);
    if not indexw(string,x[index],'|') then 
      string=catx('|',string,x[index])
    ;
  end;
  drop index;
run;

arde · Posted 08-04-2023 11:53 PM

thank you!

How to remove duplicate values in a row

Re: How to remove duplicate values in a row

Re: How to remove duplicate values in a row

Re: How to remove duplicate values in a row

Re: How to remove duplicate values in a row

Re: How to remove duplicate values in a row

Re: How to remove duplicate values in a row

Re: How to remove duplicate values in a row

Re: How to remove duplicate values in a row

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away