Obsidian | Level 7

## How to remove duplicate values in a row

Hi.

Is there a way to delete duplicates from a row:

Have

 row var1 var2 var3 1 a b b 2 c a c 3 a b c

Want

 row var1 var2 var3 1 a b 2 c a 3 a b c

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
PROC Star

## Re: How to remove duplicate values in a row

The complexity of the program will very depending on how many fields you are comparing, and whether their names really are var1, var2, var3, etc.

For the problem as posted, you could use something simple:

``````data want;
set have;
if var3 = var2 or var3 = var1 then var3 = ' ';
if var2 = var1 then var2 = ' ';
run;``````

I suspect that this will not do in real life, and that the actual problem is a bit more complex.  So describe more of what you are really facing.  How many variables?  What are the variable names?  Are they all character variables?

7 REPLIES 7
Super User

## Re: How to remove duplicate values in a row

Does actual order of the values in the variables matter? Meaning could the result for the second row look like:

 2 a c

And in your real data are the values other than single characters? There are approaches that may work for single characters that wouldn't for multi-word strings. Which raises the question of does letter case make a difference? Is

'A' a duplicate for 'a'  or is "Apple" a duplicate for "apple"?

Obsidian | Level 7

## Re: How to remove duplicate values in a row

actual order doesn't matter. real data will be more than a single character. Case will be the same.

Thank you!
PROC Star

## Re: How to remove duplicate values in a row

The complexity of the program will very depending on how many fields you are comparing, and whether their names really are var1, var2, var3, etc.

For the problem as posted, you could use something simple:

``````data want;
set have;
if var3 = var2 or var3 = var1 then var3 = ' ';
if var2 = var1 then var2 = ' ';
run;``````

I suspect that this will not do in real life, and that the actual problem is a bit more complex.  So describe more of what you are really facing.  How many variables?  What are the variable names?  Are they all character variables?

Obsidian | Level 7

## Re: How to remove duplicate values in a row

I'm using catx to add characters together separated by commas.  But some of the columns have duplicated words.  And I only need one of those words in the final catx variable.

PROC Star

## Re: How to remove duplicate values in a row

@arde wrote:

I'm using catx to add characters together separated by commas.  But some of the columns have duplicated words.  And I only need one of those words in the final catx variable.

What if the duplicate is in the middle of the series of variables?  I.e, if your values are

A  A   B

then if you set the duplicate middle var to blank, and subsequently use CATX('!',var1,var2,var3), you will get two consecutive CATX delimiters (two exclamation marks in this example) in the middle.  Is that what you want?

If not, then you could use CATX iteratively, concatenating only the non-duplicates (using '!' as the delimiter below):

``````data want (drop=i);
set have;
array chr {*} a b c ;
length new_text \$60;
new_text=left(chr{1});
do i=2 to dim(chr);
if findw(trim(new_text),trim(chr{i}),'!')=0 then new_text=catx('!',new_text,chr{i});
end;
run;
``````

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Super User

## Re: How to remove duplicate values in a row

@arde wrote:

I'm using catx to add characters together separated by commas.  But some of the columns have duplicated words.  And I only need one of those words in the final catx variable.

In that case there is no need to remove the duplicates.  Just don't add them to the list.

``````data want;
set have;
length string \$200;
array x var1-var4 ;
do index=1 to dim(x);
if not indexw(string,x[index],'|') then
string=catx('|',string,x[index])
;
end;
drop index;
run;``````
Obsidian | Level 7

## Re: How to remove duplicate values in a row

thank you!
Discussion stats
• 7 replies
• 829 views
• 0 likes
• 5 in conversation