🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fluorite | Level 6

how to find the difference from two columns which contains comma separated values

Hi Good Evening,

I have a dataset that has two columns A and B which having comma separated values. i wanted to compare these two columns A and B and i need to take the difference in both these columns and capture it in a new column.

Shown below:

Thanks in advance 🙂

Regards,

Sarav

1 ACCEPTED SOLUTION

Accepted Solutions
Tourmaline | Level 20

Re: how to find the difference from two columns which contains comma separated values

/*Or*/

``````
data mydata;
input a:\$30. b:\$30.;
datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;

data want;
set mydata;
length want \$100;
do _n_=1 to countw(b,',');
temp=scan(b,_n_,',');
if ^index(a,strip(temp)) then want=catx(',',want,temp);
end;
do _n_=1 to countw(a,',');
temp=scan(a,_n_,',');
if ^index(b,strip(temp)) then want=catx(',',want,temp);
end;
drop temp;
run;``````

9 REPLIES 9
Meteorite | Level 14

Re: how to find the difference from two columns which contains comma separated values

Hi @sarav93

Here is an approach to achieve this:

``````
data mydata;
input a:\$30. b:\$30.;
id+1;
datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;

data mydata_long;
set mydata;
do i=1 to countw(a,',');
a1 = scan(a,i);
b1 = scan(b,i);
output;
end;
keep a1 b1 id;
run;

proc sql;
create table diff as
(select id, a1 from mydata_long
except
select id, b1 from mydata_long)
union
(select id, b1 from mydata_long
except
select id, a1 from mydata_long);
quit;

data diff2 (keep=id expected);
do until (last.ID);
set diff;
by id;
length expected \$50.;
expected = catx(",",expected,a1);
end;
run;

proc sql;
create table want as
select a, b, expected
from mydata as a full join diff2 as b
on a.ID = b.ID;
quit;
``````

Best,

Fluorite | Level 6

Re: how to find the difference from two columns which contains comma separated values

Hi @ed_sas_master,

Thanks for your efforts.

i need your help more on this,

data mydata_long;
set mydata;
do i=1 to countw(a,',');
a1 = scan(a,i);
b1 = scan(b,i);
output;
end;
keep a1 b1 id;
run;

if the length of a and b varies the output diff will also get changed.

ex: if column a contains 205,8010 and column b contains 205,200,102 in this case the above code won't work.

do i=1 to countw(a,','); in this scenario Column A contains 2 and B contains 3 values.
as per this it will skip the 3rd value and process with other.

Thanks in advance 🙂
Meteorite | Level 14

Re: how to find the difference from two columns which contains comma separated values

Hi @sarav93

You're absolutely right.

In my previous code, I made the assumption that there was the same number of values in a and b.

If they can be different, you can modify the do loop statement as follows:

``do i=1 to max(countw(a,','), countw(b,','));``

Best,

Super User

Re: how to find the difference from two columns which contains comma separated values

@sarav93 wrote:
Hi @ed_sas_master,

Thanks for your efforts.

i need your help more on this,

data mydata_long;
set mydata;
do i=1 to countw(a,',');
a1 = scan(a,i);
b1 = scan(b,i);
output;
end;
keep a1 b1 id;
run;

if the length of a and b varies the output diff will also get changed.

ex: if column a contains 205,8010 and column b contains 205,200,102 in this case the above code won't work.

do i=1 to countw(a,','); in this scenario Column A contains 2 and B contains 3 values.
as per this it will skip the 3rd value and process with other.

Thanks in advance 🙂

"Doesn't work" does not explain what is actually needed. Please provide rules.

And example data that represents all of the potential cases that are in your data.

You need to provide rules for what gets added to what when the string element list counts differ. If one has 3 elements and the other 2 it is not obvious to me which values get added to what. The short one might mean that that the first value is supposed to be added to second of the longer, or that the second of the shorter gets added to the third of the longer. And what about 6 elements in and 2 in the other? There are getting to be potentially a moderate number of cases that need to be considered. So, exactly what is the rule on how to combine non-matching number of elements? And you need rules that handle every possible mismatched number in your data if any of the rules ever involve processing one out of order.

Tourmaline | Level 20

Re: how to find the difference from two columns which contains comma separated values

Hi @sarav93  The process seems rather straight forward linear search to me-

``````data mydata;
input a:\$30. b:\$30.;
datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;

data want;
set mydata;
array t(999)\$100 _temporary_;
array u(999)\$100 _temporary_;
call missing(of t(*),of u(*));
do _n_=1 to countw(b,',');
temp=scan(b,_n_,',');
if ^index(a,strip(temp)) then t(_n_)=temp;
end;
do _n_=1 to countw(a,',');
temp=scan(a,_n_,',');
if ^index(b,strip(temp)) then u(_n_)=temp;
end;
length want \$100;
want=catx(',',of u(*),of t(*));
drop temp;
run;``````

Tourmaline | Level 20

Re: how to find the difference from two columns which contains comma separated values

/*Or*/

``````
data mydata;
input a:\$30. b:\$30.;
datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;

data want;
set mydata;
length want \$100;
do _n_=1 to countw(b,',');
temp=scan(b,_n_,',');
if ^index(a,strip(temp)) then want=catx(',',want,temp);
end;
do _n_=1 to countw(a,',');
temp=scan(a,_n_,',');
if ^index(b,strip(temp)) then want=catx(',',want,temp);
end;
drop temp;
run;``````

Fluorite | Level 6

Re: how to find the difference from two columns which contains comma separated values

thankyou !!! @novinosrin
Opal | Level 21

Re: how to find the difference from two columns which contains comma separated values

Did any of the replies give you the answer you need. If so then please mark it as solution; If not please tell us what's still missing/not working.

Also: Please don't paste sample data as a screenshot but provide a SAS data step which creates such data like @ed_sas_member has done for you. That's just so we can spend the time resolving your challenge and provide tested code instead of spending the time creating sample data.

Fluorite | Level 6

Re: how to find the difference from two columns which contains comma separated values

sure Patrick.
Discussion stats
• 9 replies
• 1783 views
• 9 likes
• 5 in conversation