Hi Good Evening,
I have a dataset that has two columns A and B which having comma separated values. i wanted to compare these two columns A and B and i need to take the difference in both these columns and capture it in a new column.
Shown below:
Thanks in advance 🙂
Regards,
Sarav
/*Or*/
data mydata;
input a:$30. b:$30.;
datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;
data want;
set mydata;
length want $100;
do _n_=1 to countw(b,',');
temp=scan(b,_n_,',');
if ^index(a,strip(temp)) then want=catx(',',want,temp);
end;
do _n_=1 to countw(a,',');
temp=scan(a,_n_,',');
if ^index(b,strip(temp)) then want=catx(',',want,temp);
end;
drop temp;
run;
Hi @sarav93
Here is an approach to achieve this:
data mydata;
input a:$30. b:$30.;
id+1;
datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;
data mydata_long;
set mydata;
do i=1 to countw(a,',');
a1 = scan(a,i);
b1 = scan(b,i);
output;
end;
keep a1 b1 id;
run;
proc sql;
create table diff as
(select id, a1 from mydata_long
except
select id, b1 from mydata_long)
union
(select id, b1 from mydata_long
except
select id, a1 from mydata_long);
quit;
data diff2 (keep=id expected);
do until (last.ID);
set diff;
by id;
length expected $50.;
expected = catx(",",expected,a1);
end;
run;
proc sql;
create table want as
select a, b, expected
from mydata as a full join diff2 as b
on a.ID = b.ID;
quit;
Best,
Hi @sarav93
You're absolutely right.
In my previous code, I made the assumption that there was the same number of values in a and b.
If they can be different, you can modify the do loop statement as follows:
do i=1 to max(countw(a,','), countw(b,','));
Best,
@sarav93 wrote:
Hi @ed_sas_master,
Thanks for your efforts.
i need your help more on this,
data mydata_long;
set mydata;
do i=1 to countw(a,',');
a1 = scan(a,i);
b1 = scan(b,i);
output;
end;
keep a1 b1 id;
run;
if the length of a and b varies the output diff will also get changed.
ex: if column a contains 205,8010 and column b contains 205,200,102 in this case the above code won't work.
do i=1 to countw(a,','); in this scenario Column A contains 2 and B contains 3 values.
as per this it will skip the 3rd value and process with other.
Thanks in advance 🙂
"Doesn't work" does not explain what is actually needed. Please provide rules.
And example data that represents all of the potential cases that are in your data.
You need to provide rules for what gets added to what when the string element list counts differ. If one has 3 elements and the other 2 it is not obvious to me which values get added to what. The short one might mean that that the first value is supposed to be added to second of the longer, or that the second of the shorter gets added to the third of the longer. And what about 6 elements in and 2 in the other? There are getting to be potentially a moderate number of cases that need to be considered. So, exactly what is the rule on how to combine non-matching number of elements? And you need rules that handle every possible mismatched number in your data if any of the rules ever involve processing one out of order.
Hi @sarav93 The process seems rather straight forward linear search to me-
data mydata;
input a:$30. b:$30.;
datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;
data want;
set mydata;
array t(999)$100 _temporary_;
array u(999)$100 _temporary_;
call missing(of t(*),of u(*));
do _n_=1 to countw(b,',');
temp=scan(b,_n_,',');
if ^index(a,strip(temp)) then t(_n_)=temp;
end;
do _n_=1 to countw(a,',');
temp=scan(a,_n_,',');
if ^index(b,strip(temp)) then u(_n_)=temp;
end;
length want $100;
want=catx(',',of u(*),of t(*));
drop temp;
run;
/*Or*/
data mydata;
input a:$30. b:$30.;
datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;
data want;
set mydata;
length want $100;
do _n_=1 to countw(b,',');
temp=scan(b,_n_,',');
if ^index(a,strip(temp)) then want=catx(',',want,temp);
end;
do _n_=1 to countw(a,',');
temp=scan(a,_n_,',');
if ^index(b,strip(temp)) then want=catx(',',want,temp);
end;
drop temp;
run;
Did any of the replies give you the answer you need. If so then please mark it as solution; If not please tell us what's still missing/not working.
Also: Please don't paste sample data as a screenshot but provide a SAS data step which creates such data like @ed_sas_member has done for you. That's just so we can spend the time resolving your challenge and provide tested code instead of spending the time creating sample data.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.