Solved: Re: How to remove common values in 2 columns

rajdeep · Posted 10-16-2019 01:51 PM

Hi Folks,

I have two columns. I want to remove the common values and replace it with a space and want to store the result.

Emp_Accounts Sales_Accounts
239676,239690,240754 239676,240754

The result I need is:

Emp_Accounts Sales_Accounts Emp_Acc_Not_in_Sales Sales_Acc_Not_in_Emp
239676,239690,240754 239676,240754,234563 239690 234563

I have tried the below code, but I think I am missing something.

data test3(drop=i EMp_Count Sales_Count);
set test;

length want $100;
     length Emp_Acc_Not_in_Sales $40;
 length sales_Acc_Not_in_Emp $40;
Emp_Count=countw(Emp_Count);
Sales_Count=countw(Sales_Count);
if Emp_Count> Sales_Count then do;
retain Emp_Acc_Not_in_Sales sales_Acc_Not_in_Emp ;
do i=1 to countw(Emp_Accounts);
      string=strip(scan(Emp_Accounts, i, ','));
      if find(Emp_Accounts, string, 'it') > 0 then Emp_Acc_Not_in_Sales = tranwrd(Emp_Accounts, strip(string), ' ');
if find(Sales_Accounts, string, 'it') > 0 then sales_Acc_Not_in_Emp =tranwrd(Sales_Accounts, strip(string), ' ');
     end;
end;

run;

The above code is not holding the value in Emp_Acc_Not_in_Sales & sales_Acc_Not_in_Emp in the next iteration.

Please help me if possible.

Thanks in advance.

andreas_lds · Posted 10-25-2019 06:56 AM

All char vars are padded with blanks, so the last value returned by scan will have lots of blanks at its end and findw searches for "241149 ", which is not in the list. Trimming the result of scan should solve the issue.

if findw(sales_accounts, trim(scan(emp_accounts, i, ','))) = 0 then do;

View solution in original post

mkeintz · Posted 10-16-2019 02:15 PM

Instead of the FIND function, use the FINDW (find word) function. It will simplify your treatment of word delimiters. In fact, it assume the delimiters include comma's and blanks, which is what you need here:

data have;
  input Emp_Accounts :$20.      Sales_Accounts :$20. ;
  put (_all_) (=);
datalines;
239676,239690,240754   239676,240754
run;

data want (drop=i);
  set have;
  length not_in_sales_accounts not_in_emp_accounts $20;
  do i=1 to countw(emp_accounts);
    if findw(sales_accounts,scan(emp_accounts,i,',')) = 0 then 
    not_in_sales_accounts=catx(',',not_in_sales_accounts,scan(emp_accounts,i,','));
  end;
  do i=1 to countw(sales_accounts);
    if findw(emp_accounts,scan(sales_accounts,i,',')) = 0 then 
    not_in_emp_accounts=catx(',',not_in_emp_accounts,scan(sales_accounts,i,','));
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

rajdeep · Posted 10-24-2019 08:18 AM

Hi Folks,

Just one small ask. Please help.

I am running the below code with the following values, but the output is catching one account extra in not_in_emp_accounts.

data have;
  input Emp_Accounts :$200.      Sales_Accounts :$200. ;
  put (_all_) (=);
datalines;
239684,239789,240173,240262,240760,241149,241175,241240,241249,241691,241750   240740,241149
run;

data want (drop=i);
  set have;
  length not_in_sales_accounts not_in_emp_accounts $200;
  do i=1 to countw(emp_accounts);
    if findw(sales_accounts,scan(emp_accounts,i,',')) = 0 then 
    not_in_sales_accounts=catx(',',not_in_sales_accounts,scan(emp_accounts,i,','));
  end;
  do i=1 to countw(sales_accounts);
    if findw(emp_accounts,scan(sales_accounts,i,',')) = 0 then 
    not_in_emp_accounts=catx(',',not_in_emp_accounts,scan(sales_accounts,i,','));
  end;
run;

The output is coming like below:

not_in_sales_accounts not_in_emp_accounts
239684,239789,240173,240262,240760,241175,241240,241249,241691,241750 240740,241149

But you can see that 241149 is already present in emp_accounts.

Rather the output should come:

not_in_sales_accounts not_in_emp_accounts
239684,239789,240173,240262,240760,241175,241240,241249,241691,241750 240740

I hope you got my point. Can you please explain why it's catching the common account which is already present in both especially in this case?

Please help. Thanks in advance.

andreas_lds · Posted 10-25-2019 06:56 AM

All char vars are padded with blanks, so the last value returned by scan will have lots of blanks at its end and findw searches for "241149 ", which is not in the list. Trimming the result of scan should solve the issue.

if findw(sales_accounts, trim(scan(emp_accounts, i, ','))) = 0 then do;

rajdeep · Posted 10-25-2019 07:16 AM

Wow....Andreas_lds... It's working now......

You are great... I am so silly.... Hehe.....

Kudos to you.....Have a great day ahead... Thanks a lot.

rajdeep · Posted 09-08-2020 07:46 AM

Hi mkeintz,

Just an additional help is needed.

I just want to compare all the Depo_IDs for one group to Shop_IDs, if all the IDs matched then Comments should print "Matched" or else "Not Matched".

Pet_Names	Depo_Count	Depo_IDs	Shop_Count	Shop_IDs	Comments	Depo_Id_Not_In_Shop	Shop_Id_Not_in_Depo
Dog	2	6346560,6354972	1	6354972	Matched	6346560
Dog	1	6360302	2	6346560,6360302	Matched		6346560
Cat	1	6320478,6296089	1	6296088	Not Matched	6320478,6296089	6296088
Cat	3	6296088,6327084,6354370	3	6320478,6327084,6354370	Not Matched	6296088	6320478
Pigeon	1	6320478,6296089	0	.	Not Matched	6320478,6296089	6296088
Pigeon	3	6296088,6327084,6354370	3	6320478,6327084,6354370	Not Matched	6296088	6320478
Parrot	1	6327084	1	6327084	Matched

The above is the expected output.

Like, From Dog Category total 3 Depo_Ids are there and that has to get compared with 3 Shop_Ids from Dog Category. You can see if 3 vs 3 will be compared, token by token then it will be matched and Comments Column should go as "Matched".

Same like for Cat: 4 vs 4 will be compared

Pigeon: 4 vs 3 will be compared and so on.

In simple words, all the Depo_ids from one Pet_names group will be compared to Shop_ids from the same group, if matched then Comments=Matched else Not matched.

I have already done the same with almost 5 steps.

The below is my code:

FILENAME REFFILE 'C:\Pet_Depo.xlsx';
PROC IMPORT DATAFILE=REFFILE replace
	DBMS=XLSX
	OUT=WORK.Pet_Depo;
	GETNAMES=YES;
RUN;

proc sort data=Pet_Depo;
by Pet_Names;
run;

data grp_id_plt(drop=Comments);
set Pet_Depo ;
length Depo_Id_concat Shop_Id_concat $400;
by Pet_Names ;
retain Depo_Id_concat Shop_Id_concat ;
if first.Pet_Names then do ;
group_id =1;
Depo_Id_concat ='';
Shop_Id_concat ='';
end;
 do i=1 to countw(Depo_IDs);
    Depo_Id_concat=catx(';',strip(Depo_Id_concat),scan(Depo_IDs,i,','));
end;
do i=1 to countw(Shop_IDs);
	Shop_Id_concat =catx(";",strip(Shop_Id_concat),scan(Shop_IDs,i,','));
end;
group_id+1;
if last.Pet_Names then output;
run;

data Comments_holder;
length Comments $20;
set grp_id_plt;
set_con_shop = 'Matched';
do i=1 to countw(Shop_Id_concat,';');
if set_con_shop = 'Matched' then do;
con_shop = findw(Depo_Id_concat,strip(scan(Shop_Id_concat,i))) ;
if con_shop = 0 then set_con_shop = 'Not Matched';
end;
end;
set_con_depo = 'Matched';
do i=1 to countw(Depo_Id_concat,';');
if set_con_depo = 'Matched' then do;
con_depo = findw(Shop_Id_concat ,strip(scan(Depo_Id_concat,i))) ;
if con_depo = 0 then set_con_depo = 'Not Matched';
end;
end;
if set_con_shop = 'Matched' and set_con_depo ='Matched' then Comments = 'Matched';
else Comments = 'Not Matched';
drop set_con_shop set_con_depo con_shop con_depo i ;
run;

PROC SQL;
   CREATE TABLE WORK.Final AS 
   SELECT DISTINCT t1.Pet_Names, 
          
          t1.Depo_Count, 
          t1.Depo_IDs, 
          t1.Shop_Count, 
          t1.Shop_IDs, 
          t1.Depo_Id_Not_In_Shop, 
          t1.Shop_Id_Not_in_Depo, 
          t2.Comments
      FROM WORK.Pet_Depo t1
           LEFT JOIN WORK.Comments_HOLDER t2 ON (t1.Pet_Names = t2.Pet_Names);
QUIT;

Please suggest if any alternate way can I do the same or in lesser steps if it's possible suppose.

Thanks

ChrisNZ · Posted 10-16-2019 11:41 PM

The output does not match your description. You are not removing values, you add values.
Please state your goal more clearly.

High-Performance SAS Coding - Third Edition

rajdeep · Posted 10-17-2019 04:20 AM

Hi Chrisnz,

I think the sample example in the above clearly showing what should be the expected output. Yeah, the code I had written that's somehow was not doing as expected. I want to check the one column values with another and if match found then I just want to remove the same and keep the remaining.

I hope that helps for your understanding. Please let me know if still there's any disconnect.

Thanks for your response.

rajdeep · Posted 10-25-2019 05:15 AM

Hi Team,

Please help me if possible what I am missing in the above code.

Thanks in advance.

SAS Innovate 2025: Call for Content

Classroom Training Available!