you have column A with multiple values separated by comas and some values appear more than once in a cell.
How can you copy column A into column B without duplicate values .
For example. in column A:
A1 = a, b
A2 =
A3 = c, b, c
A4 = a, b, b,c,a
A5 = d
Column B would be:
B1 = a, b
B2 =
B3 = c, b
B4 = a, b, c
B5 = d
Please provide examples.
This looks like an Excel question. Is it?
I don't think so, because there are other columns that have to piggyback without change.
For a SAS table something like below should work.
data have;
infile datalines truncover;
input a $10.;
datalines;
a, b
c, b, c
a, b, b,c,a
d
;
data want(drop=_:);
set have;
array words{100} $10 _temporary_;
length _word $10;
b=a;
do _i=1 to dim(words) until(missing(_word));
_word=scan(b,_i,',');
if whichc(_word,of words[*])=0 then words{_i}=_word;
end;
b=catx(',',of words[*]);
call missing(of words[*]);
run;
proc print data=want;
run;
thank you for your help. I appreciate that
You should ask this in an Excel-oriented forum.
With a SAS dataset, I would first expand this to a vertical layout, after which a SORT with NODUPKEY will do it.
@melassiri wrote:
I don't think so, because there are other columns that have to piggyback without change.
Why would that matter? When you expand the comma separated list into multiple observations they will all have the exact same values for the "other" variables. So when you collapse them again (if you ever need to) there is no problem.
Of course it is easiest if the there is a simple BY variable (or perhaps a couple) that uniquely identify the original observations. Of course you can always add one while you expand to multiple observations.
data tall;
id+1;
set have;
do i=1 to max(1,countw(A,','));
value = strip(scan(A,i,', '));
output;
end;
drop i ;
run;
proc sort data=tall nodupkey out=want;
by id value;
run;
data want;
do until(last.id);
set want;
by id;
length B $100;
b = catx(',',b,value);
end;
drop value;
run;
You could try this:
data have;
infile datalines truncover;
input a $10.;
datalines;
a, b
c, b, c
a, b, b,c,a
d
;
run;
filename f TEMP;
data _null_; file f; put; run;
%let N=20; /* expected max number of elements */
data want;
infile f dlm=" ," truncover;
set have;
input @@;
_infile_ = a;
input @1 (n1-n&N.) (:$1.) @@;
array n(i) n:;
array m[&N.] $ 1;
do over n;
if n NE " " AND NOT (n IN m) then m(i) = n;
end;
b = catx(",", of m[*]);
keep a b;
run;
filename f clear;
proc print;
run;
Filename and data _null_ is just a "fake" to trick the infile statement in the data want;
In data want logic is the following:
1) convince SAS it will be reading a text file
2) read data form HAVE
3) pretend you are reading text file
4) replace the buffer with your variable value
5) re-read the buffer and create &N. character variables
6) initiate arrays of n's and m's of the same size &N.
7) loop over array N,
😎 if variable N is not missing and is not in array M add it to M
9) catX M variables into comma separated string
[EDIT:] that sunglasses face 😎 should be "eight & closing parenthesis"
Bart
You could scan each "word" (i.e. each comma-separated value) in A and add it to B if it isn't already there:
data have;
infile datalines truncover;
input a $10.;
datalines;
a, b
c, b, c
a, b, b,c,a
d
;
data want (drop=w word);
set have;
length b $10;
do w=1 to countw(a,',');
word=scan(a,w,', ');
if findw(b,trim(word),', ')=0 then b=catx(',',b,word);
end;
b=transtrn(b,',',', '); *Keep this if you want blanks after commas *;
run;
If you allow to sort the content of column A, then prxchange can be used:
data have;
infile datalines truncover;
input a $11.;
datalines;
a, b
c, b, c
a, b, b,c,a
d
;
data want;
set have;
array _dummy_[100] $1 _temporary_;
call missing(of _dummy_[*]);
cmpres_a=compress(a,' ,');
call pokelong(cmpres_a,addrlong(_dummy_[1]),length(cmpres_a));
call sort(of _dummy_[*]);
new_a=prxchange('s/\b(\w), (\1)\b/$1/',-1,catx(', ',of _dummy_[*]));
put new_a=;
run;
The result will be:
new_a=a, b new_a= new_a=b, c new_a=a, b, c new_a=d
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.