BookmarkSubscribeRSS Feed
melassiri
Fluorite | Level 6

you have column A with multiple values separated by comas and some values appear more than once in a cell.
How can you copy column A into column B without duplicate values .
For example. in column A:
A1 = a, b
A2 =
A3 = c, b, c
A4 = a, b, b,c,a
A5 = d
Column B would be:
B1 = a, b
B2 =
B3 = c, b
B4 = a, b, c
B5 = d
Please provide examples.

10 REPLIES 10
PaigeMiller
Diamond | Level 26

This looks like an Excel question. Is it?

--
Paige Miller
melassiri
Fluorite | Level 6

I don't think so, because there are other columns that have to piggyback without change.

Patrick
Opal | Level 21

For a SAS table something like below should work.

data have;
  infile datalines truncover;
  input a $10.;
  datalines;
a, b

c, b, c
a, b, b,c,a
d
;

data want(drop=_:);
  set have;
  array words{100} $10 _temporary_;
  length _word $10;

  b=a;
  do _i=1 to dim(words) until(missing(_word));
    _word=scan(b,_i,',');
    if whichc(_word,of words[*])=0 then words{_i}=_word;
  end;
  b=catx(',',of words[*]);
  call missing(of words[*]);
run;

proc print data=want;
run;

 

Patrick_0-1709726168916.png

 

melassiri
Fluorite | Level 6

thank you for your help. I appreciate that

Kurt_Bremser
Super User

You should ask this in an Excel-oriented forum.

With a SAS dataset, I would first expand this to a vertical layout, after which a SORT with NODUPKEY will do it.

melassiri
Fluorite | Level 6
I don't think so, because there are other columns that have to piggyback without change.
Tom
Super User Tom
Super User

@melassiri wrote:
I don't think so, because there are other columns that have to piggyback without change.

Why would that matter?  When you expand the comma separated list into multiple observations they will all have the exact same values for the "other" variables.  So when you collapse them again (if you ever need to) there is no problem.

 

Of course it is easiest if the there is a simple BY variable (or perhaps a couple) that uniquely identify the original observations.  Of course you can always add one while you expand to multiple observations.

data tall;
  id+1;
  set have;
  do i=1 to max(1,countw(A,','));
    value = strip(scan(A,i,', '));
    output; 
  end;
  drop i ;
run;
proc sort data=tall nodupkey out=want;
  by id value;
run;
data want;
  do until(last.id);
    set want;
    by id;
    length B $100;
    b = catx(',',b,value);
  end;
  drop value;
run;

 

yabwon
Onyx | Level 15

You could try this:

data have;
  infile datalines truncover;
  input a $10.;
  datalines;
a, b

c, b, c
a, b, b,c,a
d
;
run;

filename f TEMP;
data _null_; file f; put; run;

%let N=20; /* expected max number of elements */

data want;
  infile f dlm=" ," truncover;
  set have;

  input @@;
  _infile_ = a;
  input @1 (n1-n&N.) (:$1.) @@;

  array n(i) n:;
  array m[&N.] $ 1;

  do over n;
    if n NE " " AND NOT (n IN m) then m(i) = n;
  end;

  b = catx(",", of m[*]);
  keep a b;
run;

filename f clear;


proc print;
run;

Filename  and data _null_ is just a "fake" to trick the infile statement in the data want;

 

In data want logic is the following:

1) convince SAS it will be reading a text file

2) read data form HAVE

3) pretend you are reading text file

4) replace the buffer with your variable value

5) re-read the buffer and create &N. character variables

6) initiate arrays of n's and m's of the same size &N.

7) loop over array N, 

😎 if variable N is not missing and is not in array M add it to M

9) catX M variables into comma separated string

 

[EDIT:] that sunglasses face 😎 should be "eight & closing parenthesis"

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



mkeintz
PROC Star

You could scan each "word" (i.e. each comma-separated value) in A and add it to B if it isn't already there:

 

data have;
  infile datalines truncover;
  input a $10.;
  datalines;
a, b
c, b, c
a, b, b,c,a
d
;

data want (drop=w word);
  set have;
  length b $10;
  do w=1 to countw(a,',');
    word=scan(a,w,', ');
    if findw(b,trim(word),', ')=0 then b=catx(',',b,word);
  end;
  b=transtrn(b,',',', ');  *Keep this if you want blanks after commas *;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
whymath
Lapis Lazuli | Level 10

If you allow to sort the content of column A, then prxchange can be used:

data have;
  infile datalines truncover;
  input a $11.;
  datalines;
a, b

c, b, c
a, b, b,c,a
d
;

data want;
  set have;

  array _dummy_[100] $1 _temporary_;
  call missing(of _dummy_[*]);
  cmpres_a=compress(a,' ,');
  call pokelong(cmpres_a,addrlong(_dummy_[1]),length(cmpres_a));
  call sort(of _dummy_[*]);
  new_a=prxchange('s/\b(\w), (\1)\b/$1/',-1,catx(', ',of _dummy_[*]));
  put new_a=;
run;

The result will be:

new_a=a, b
new_a=
new_a=b, c
new_a=a, b, c
new_a=d

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1332 views
  • 4 likes
  • 8 in conversation