Hi.
Is there a way to delete duplicates from a row:
Have
row | var1 | var2 | var3 |
1 | a | b | b |
2 | c | a | c |
3 | a | b | c |
Want
row | var1 | var2 | var3 |
1 | a | b | |
2 | c | a | |
3 | a | b | c |
Thank you
The complexity of the program will very depending on how many fields you are comparing, and whether their names really are var1, var2, var3, etc.
For the problem as posted, you could use something simple:
data want;
set have;
if var3 = var2 or var3 = var1 then var3 = ' ';
if var2 = var1 then var2 = ' ';
run;
I suspect that this will not do in real life, and that the actual problem is a bit more complex. So describe more of what you are really facing. How many variables? What are the variable names? Are they all character variables?
Does actual order of the values in the variables matter? Meaning could the result for the second row look like:
2 | a | c |
And in your real data are the values other than single characters? There are approaches that may work for single characters that wouldn't for multi-word strings. Which raises the question of does letter case make a difference? Is
'A' a duplicate for 'a' or is "Apple" a duplicate for "apple"?
The complexity of the program will very depending on how many fields you are comparing, and whether their names really are var1, var2, var3, etc.
For the problem as posted, you could use something simple:
data want;
set have;
if var3 = var2 or var3 = var1 then var3 = ' ';
if var2 = var1 then var2 = ' ';
run;
I suspect that this will not do in real life, and that the actual problem is a bit more complex. So describe more of what you are really facing. How many variables? What are the variable names? Are they all character variables?
I'm using catx to add characters together separated by commas. But some of the columns have duplicated words. And I only need one of those words in the final catx variable.
@arde wrote:
I'm using catx to add characters together separated by commas. But some of the columns have duplicated words. And I only need one of those words in the final catx variable.
What if the duplicate is in the middle of the series of variables? I.e, if your values are
A A B
then if you set the duplicate middle var to blank, and subsequently use CATX('!',var1,var2,var3), you will get two consecutive CATX delimiters (two exclamation marks in this example) in the middle. Is that what you want?
If not, then you could use CATX iteratively, concatenating only the non-duplicates (using '!' as the delimiter below):
data want (drop=i);
set have;
array chr {*} a b c ;
length new_text $60;
new_text=left(chr{1});
do i=2 to dim(chr);
if findw(trim(new_text),trim(chr{i}),'!')=0 then new_text=catx('!',new_text,chr{i});
end;
run;
@arde wrote:
I'm using catx to add characters together separated by commas. But some of the columns have duplicated words. And I only need one of those words in the final catx variable.
In that case there is no need to remove the duplicates. Just don't add them to the list.
data want;
set have;
length string $200;
array x var1-var4 ;
do index=1 to dim(x);
if not indexw(string,x[index],'|') then
string=catx('|',string,x[index])
;
end;
drop index;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.