BookmarkSubscribeRSS Feed
melassiri
Calcite | Level 5

you have column A with multiple values separated by comas and some values appear more than once in a cell.
How can you copy column A into column B without duplicate values .
For example. in column A:
A1 = a, b
A2 =
A3 = c, b, c
A4 = a, b, b,c,a
A5 = d
Column B would be:
B1 = a, b
B2 =
B3 = c, b
B4 = a, b, c
B5 = d
Please provide examples.

10 REPLIES 10
PaigeMiller
Diamond | Level 26

This looks like an Excel question. Is it?

--
Paige Miller
melassiri
Calcite | Level 5

I don't think so, because there are other columns that have to piggyback without change.

Patrick
Opal | Level 21

For a SAS table something like below should work.

data have;
  infile datalines truncover;
  input a $10.;
  datalines;
a, b

c, b, c
a, b, b,c,a
d
;

data want(drop=_:);
  set have;
  array words{100} $10 _temporary_;
  length _word $10;

  b=a;
  do _i=1 to dim(words) until(missing(_word));
    _word=scan(b,_i,',');
    if whichc(_word,of words[*])=0 then words{_i}=_word;
  end;
  b=catx(',',of words[*]);
  call missing(of words[*]);
run;

proc print data=want;
run;

 

Patrick_0-1709726168916.png

 

melassiri
Calcite | Level 5

thank you for your help. I appreciate that

Kurt_Bremser
Super User

You should ask this in an Excel-oriented forum.

With a SAS dataset, I would first expand this to a vertical layout, after which a SORT with NODUPKEY will do it.

melassiri
Calcite | Level 5
I don't think so, because there are other columns that have to piggyback without change.
Tom
Super User Tom
Super User

@melassiri wrote:
I don't think so, because there are other columns that have to piggyback without change.

Why would that matter?  When you expand the comma separated list into multiple observations they will all have the exact same values for the "other" variables.  So when you collapse them again (if you ever need to) there is no problem.

 

Of course it is easiest if the there is a simple BY variable (or perhaps a couple) that uniquely identify the original observations.  Of course you can always add one while you expand to multiple observations.

data tall;
  id+1;
  set have;
  do i=1 to max(1,countw(A,','));
    value = strip(scan(A,i,', '));
    output; 
  end;
  drop i ;
run;
proc sort data=tall nodupkey out=want;
  by id value;
run;
data want;
  do until(last.id);
    set want;
    by id;
    length B $100;
    b = catx(',',b,value);
  end;
  drop value;
run;

 

yabwon
Onyx | Level 15

You could try this:

data have;
  infile datalines truncover;
  input a $10.;
  datalines;
a, b

c, b, c
a, b, b,c,a
d
;
run;

filename f TEMP;
data _null_; file f; put; run;

%let N=20; /* expected max number of elements */

data want;
  infile f dlm=" ," truncover;
  set have;

  input @@;
  _infile_ = a;
  input @1 (n1-n&N.) (:$1.) @@;

  array n(i) n:;
  array m[&N.] $ 1;

  do over n;
    if n NE " " AND NOT (n IN m) then m(i) = n;
  end;

  b = catx(",", of m[*]);
  keep a b;
run;

filename f clear;


proc print;
run;

Filename  and data _null_ is just a "fake" to trick the infile statement in the data want;

 

In data want logic is the following:

1) convince SAS it will be reading a text file

2) read data form HAVE

3) pretend you are reading text file

4) replace the buffer with your variable value

5) re-read the buffer and create &N. character variables

6) initiate arrays of n's and m's of the same size &N.

7) loop over array N, 

😎 if variable N is not missing and is not in array M add it to M

9) catX M variables into comma separated string

 

[EDIT:] that sunglasses face 😎 should be "eight & closing parenthesis"

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



mkeintz
PROC Star

You could scan each "word" (i.e. each comma-separated value) in A and add it to B if it isn't already there:

 

data have;
  infile datalines truncover;
  input a $10.;
  datalines;
a, b
c, b, c
a, b, b,c,a
d
;

data want (drop=w word);
  set have;
  length b $10;
  do w=1 to countw(a,',');
    word=scan(a,w,', ');
    if findw(b,trim(word),', ')=0 then b=catx(',',b,word);
  end;
  b=transtrn(b,',',', ');  *Keep this if you want blanks after commas *;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
whymath
Lapis Lazuli | Level 10

If you allow to sort the content of column A, then prxchange can be used:

data have;
  infile datalines truncover;
  input a $11.;
  datalines;
a, b

c, b, c
a, b, b,c,a
d
;

data want;
  set have;

  array _dummy_[100] $1 _temporary_;
  call missing(of _dummy_[*]);
  cmpres_a=compress(a,' ,');
  call pokelong(cmpres_a,addrlong(_dummy_[1]),length(cmpres_a));
  call sort(of _dummy_[*]);
  new_a=prxchange('s/\b(\w), (\1)\b/$1/',-1,catx(', ',of _dummy_[*]));
  put new_a=;
run;

The result will be:

new_a=a, b
new_a=
new_a=b, c
new_a=a, b, c
new_a=d

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 577 views
  • 4 likes
  • 8 in conversation