BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sarav93
Fluorite | Level 6

 

Hi Good Evening,

 

I have a dataset that has two columns A and B which having comma separated values. i wanted to compare these two columns A and B and i need to take the difference in both these columns and capture it in a new column.

 

Shown below:

 

sarav93_0-1592566689624.png

 

Thanks in advance 🙂

 

Regards,

Sarav

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

/*Or*/

 


data mydata;
	input a:$30. b:$30.;
	datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;


data want;
 set mydata;
 length want $100;
 do _n_=1 to countw(b,',');
  temp=scan(b,_n_,',');
  if ^index(a,strip(temp)) then want=catx(',',want,temp);
 end;
 do _n_=1 to countw(a,',');
  temp=scan(a,_n_,',');
  if ^index(b,strip(temp)) then want=catx(',',want,temp);
 end;
 drop temp;
run;

 

View solution in original post

9 REPLIES 9
ed_sas_member
Meteorite | Level 14

Hi @sarav93 

 

Here is an approach to achieve this:


data mydata;
	input a:$30. b:$30.;
	id+1;
	datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;

data mydata_long;
	set mydata;
	do i=1 to countw(a,',');
		a1 = scan(a,i);
		b1 = scan(b,i);
		output;
	end;
	keep a1 b1 id;
run;

proc sql;
	create table diff as
	(select id, a1 from mydata_long
	except 
	select id, b1 from mydata_long)
	union 
	(select id, b1 from mydata_long
	except 
	select id, a1 from mydata_long);
quit;

data diff2 (keep=id expected);
	do until (last.ID);
		set diff;
		by id;
		length expected $50.;
		expected = catx(",",expected,a1);
	end;
run;

proc sql;
	create table want as
	select a, b, expected
	from mydata as a full join diff2 as b
	on a.ID = b.ID;
quit;
	

Best,

sarav93
Fluorite | Level 6
Hi @ed_sas_master,

Thanks for your efforts.

i need your help more on this,

data mydata_long;
set mydata;
do i=1 to countw(a,',');
a1 = scan(a,i);
b1 = scan(b,i);
output;
end;
keep a1 b1 id;
run;

if the length of a and b varies the output diff will also get changed.

ex: if column a contains 205,8010 and column b contains 205,200,102 in this case the above code won't work.

do i=1 to countw(a,','); in this scenario Column A contains 2 and B contains 3 values.
as per this it will skip the 3rd value and process with other.

Thanks in advance 🙂
ed_sas_member
Meteorite | Level 14

Hi @sarav93 

You're absolutely right.

In my previous code, I made the assumption that there was the same number of values in a and b.

If they can be different, you can modify the do loop statement as follows:

do i=1 to max(countw(a,','), countw(b,','));

Best,

ballardw
Super User

@sarav93 wrote:
Hi @ed_sas_master,

Thanks for your efforts.

i need your help more on this,

data mydata_long;
set mydata;
do i=1 to countw(a,',');
a1 = scan(a,i);
b1 = scan(b,i);
output;
end;
keep a1 b1 id;
run;

if the length of a and b varies the output diff will also get changed.

ex: if column a contains 205,8010 and column b contains 205,200,102 in this case the above code won't work.

do i=1 to countw(a,','); in this scenario Column A contains 2 and B contains 3 values.
as per this it will skip the 3rd value and process with other.

Thanks in advance 🙂

"Doesn't work" does not explain what is actually needed. Please provide rules.

And example data that represents all of the potential cases that are in your data.

 

You need to provide rules for what gets added to what when the string element list counts differ. If one has 3 elements and the other 2 it is not obvious to me which values get added to what. The short one might mean that that the first value is supposed to be added to second of the longer, or that the second of the shorter gets added to the third of the longer. And what about 6 elements in and 2 in the other? There are getting to be potentially a moderate number of cases that need to be considered. So, exactly what is the rule on how to combine non-matching number of elements? And you need rules that handle every possible mismatched number in your data if any of the rules ever involve processing one out of order.

novinosrin
Tourmaline | Level 20

Hi @sarav93  The process seems rather straight forward linear search to me-

data mydata;
	input a:$30. b:$30.;
	datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;


data want;
 set mydata;
 array t(999)$100 _temporary_;
 array u(999)$100 _temporary_;
 call missing(of t(*),of u(*));
 do _n_=1 to countw(b,',');
  temp=scan(b,_n_,',');
  if ^index(a,strip(temp)) then t(_n_)=temp;
 end;
 do _n_=1 to countw(a,',');
  temp=scan(a,_n_,',');
  if ^index(b,strip(temp)) then u(_n_)=temp;
 end;
 length want $100;
 want=catx(',',of u(*),of t(*));
 drop temp;
run;

 

novinosrin
Tourmaline | Level 20

/*Or*/

 


data mydata;
	input a:$30. b:$30.;
	datalines;
61,213,625,637,8010 61,625,637,650,8010
61,213,625,650,8010 61,213,637,650,8010
;


data want;
 set mydata;
 length want $100;
 do _n_=1 to countw(b,',');
  temp=scan(b,_n_,',');
  if ^index(a,strip(temp)) then want=catx(',',want,temp);
 end;
 do _n_=1 to countw(a,',');
  temp=scan(a,_n_,',');
  if ^index(b,strip(temp)) then want=catx(',',want,temp);
 end;
 drop temp;
run;

 

Patrick
Opal | Level 21

@sarav93 

Did any of the replies give you the answer you need. If so then please mark it as solution; If not please tell us what's still missing/not working.

Also: Please don't paste sample data as a screenshot but provide a SAS data step which creates such data like @ed_sas_member has done for you. That's just so we can spend the time resolving your challenge and provide tested code instead of spending the time creating sample data.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1758 views
  • 9 likes
  • 5 in conversation