Re: Double value elimination on one cell

melassiri · Posted 03-06-2024 06:13 AM

you have column A with multiple values separated by comas and some values appear more than once in a cell.
How can you copy column A into column B without duplicate values .
For example. in column A:
A1 = a, b
A2 =
A3 = c, b, c
A4 = a, b, b,c,a
A5 = d
Column B would be:
B1 = a, b
B2 =
B3 = c, b
B4 = a, b, c
B5 = d
Please provide examples.

PaigeMiller · Posted 03-06-2024 06:18 AM

This looks like an Excel question. Is it?

--
Paige Miller

melassiri · Posted 03-06-2024 11:36 AM

I don't think so, because there are other columns that have to piggyback without change.

Patrick · Posted 03-06-2024 06:55 AM

For a SAS table something like below should work.

data have;
  infile datalines truncover;
  input a $10.;
  datalines;
a, b

c, b, c
a, b, b,c,a
d
;

data want(drop=_:);
  set have;
  array words{100} $10 _temporary_;
  length _word $10;

  b=a;
  do _i=1 to dim(words) until(missing(_word));
    _word=scan(b,_i,',');
    if whichc(_word,of words[*])=0 then words{_i}=_word;
  end;
  b=catx(',',of words[*]);
  call missing(of words[*]);
run;

proc print data=want;
run;

melassiri · Posted 03-06-2024 11:40 AM

thank you for your help. I appreciate that

Kurt_Bremser · Posted 03-06-2024 07:09 AM

You should ask this in an Excel-oriented forum.

With a SAS dataset, I would first expand this to a vertical layout, after which a SORT with NODUPKEY will do it.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

melassiri · Posted 03-06-2024 11:37 AM

I don't think so, because there are other columns that have to piggyback without change.

Tom · Posted 03-07-2024 12:16 AM

@melassiri wrote:
I don't think so, because there are other columns that have to piggyback without change.

Why would that matter? When you expand the comma separated list into multiple observations they will all have the exact same values for the "other" variables. So when you collapse them again (if you ever need to) there is no problem.

Of course it is easiest if the there is a simple BY variable (or perhaps a couple) that uniquely identify the original observations. Of course you can always add one while you expand to multiple observations.

data tall;
  id+1;
  set have;
  do i=1 to max(1,countw(A,','));
    value = strip(scan(A,i,', '));
    output; 
  end;
  drop i ;
run;
proc sort data=tall nodupkey out=want;
  by id value;
run;
data want;
  do until(last.id);
    set want;
    by id;
    length B $100;
    b = catx(',',b,value);
  end;
  drop value;
run;

yabwon · Posted 03-06-2024 12:18 PM

You could try this:

data have;
  infile datalines truncover;
  input a $10.;
  datalines;
a, b

c, b, c
a, b, b,c,a
d
;
run;

filename f TEMP;
data _null_; file f; put; run;

%let N=20; /* expected max number of elements */

data want;
  infile f dlm=" ," truncover;
  set have;

  input @@;
  _infile_ = a;
  input @1 (n1-n&N.) (:$1.) @@;

  array n(i) n:;
  array m[&N.] $ 1;

  do over n;
    if n NE " " AND NOT (n IN m) then m(i) = n;
  end;

  b = catx(",", of m[*]);
  keep a b;
run;

filename f clear;


proc print;
run;

Filename and data _null_ is just a "fake" to trick the infile statement in the data want;

In data want logic is the following:

1) convince SAS it will be reading a text file

2) read data form HAVE

3) pretend you are reading text file

4) replace the buffer with your variable value

5) re-read the buffer and create &N. character variables

6) initiate arrays of n's and m's of the same size &N.

7) loop over array N,

😎 if variable N is not missing and is not in array M add it to M

9) catX M variables into comma separated string

[EDIT:] that sunglasses face 😎 should be "eight & closing parenthesis"

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

mkeintz · Posted 03-06-2024 08:44 PM

You could scan each "word" (i.e. each comma-separated value) in A and add it to B if it isn't already there:

data have;
  infile datalines truncover;
  input a $10.;
  datalines;
a, b
c, b, c
a, b, b,c,a
d
;

data want (drop=w word);
  set have;
  length b $10;
  do w=1 to countw(a,',');
    word=scan(a,w,', ');
    if findw(b,trim(word),', ')=0 then b=catx(',',b,word);
  end;
  b=transtrn(b,',',', ');  *Keep this if you want blanks after commas *;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

whymath · Posted 03-06-2024 09:55 PM

If you allow to sort the content of column A, then prxchange can be used:

data have;
  infile datalines truncover;
  input a $11.;
  datalines;
a, b

c, b, c
a, b, b,c,a
d
;

data want;
  set have;

  array _dummy_[100] $1 _temporary_;
  call missing(of _dummy_[*]);
  cmpres_a=compress(a,' ,');
  call pokelong(cmpres_a,addrlong(_dummy_[1]),length(cmpres_a));
  call sort(of _dummy_[*]);
  new_a=prxchange('s/\b(\w), (\1)\b/$1/',-1,catx(', ',of _dummy_[*]));
  put new_a=;
run;

The result will be:

new_a=a, b
new_a=
new_a=b, c
new_a=a, b, c
new_a=d

SAS Innovate 2025: Call for Content

Classroom Training Available!