BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rajdeep
Pyrite | Level 9

Hi Folks,

 

I have two columns. I want to remove the common values and replace it with a space and want to store the result.

 

 Emp_Accounts                Sales_Accounts
239676,239690,240754   239676,240754

 

The result I need is:

 

 Emp_Accounts                Sales_Accounts                    Emp_Acc_Not_in_Sales        Sales_Acc_Not_in_Emp
239676,239690,240754   239676,240754,234563      239690                                      234563

 

 

I have tried the below code, but I think I am missing something.

 

data test3(drop=i EMp_Count Sales_Count);
set test;

length want $100;
     length Emp_Acc_Not_in_Sales $40;
 length sales_Acc_Not_in_Emp $40;
Emp_Count=countw(Emp_Count);
Sales_Count=countw(Sales_Count);
if Emp_Count> Sales_Count then do;
retain Emp_Acc_Not_in_Sales sales_Acc_Not_in_Emp ;
do i=1 to countw(Emp_Accounts);
      string=strip(scan(Emp_Accounts, i, ','));
      if find(Emp_Accounts, string, 'it') > 0 then Emp_Acc_Not_in_Sales = tranwrd(Emp_Accounts, strip(string), ' ');
if find(Sales_Accounts, string, 'it') > 0 then sales_Acc_Not_in_Emp =tranwrd(Sales_Accounts, strip(string), ' ');
     end;
end;

run;

     The above code is not holding the value in Emp_Acc_Not_in_Sales & sales_Acc_Not_in_Emp in the next iteration.

Please help me if possible.

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

All char vars are padded with blanks, so the last value returned by scan will have lots of blanks at its end and findw searches for "241149          ", which is not in the list. Trimming the result of scan should solve the issue.

 

if findw(sales_accounts, trim(scan(emp_accounts, i, ','))) = 0 then do;

View solution in original post

8 REPLIES 8
mkeintz
PROC Star

Instead of the FIND function, use the FINDW (find word) function.  It will simplify your treatment of word delimiters.  In fact, it assume the delimiters include comma's and blanks, which is what you need here:

 

data have;
  input Emp_Accounts :$20.      Sales_Accounts :$20. ;
  put (_all_) (=);
datalines;
239676,239690,240754   239676,240754
run;

data want (drop=i);
  set have;
  length not_in_sales_accounts not_in_emp_accounts $20;
  do i=1 to countw(emp_accounts);
    if findw(sales_accounts,scan(emp_accounts,i,',')) = 0 then 
    not_in_sales_accounts=catx(',',not_in_sales_accounts,scan(emp_accounts,i,','));
  end;
  do i=1 to countw(sales_accounts);
    if findw(emp_accounts,scan(sales_accounts,i,',')) = 0 then 
    not_in_emp_accounts=catx(',',not_in_emp_accounts,scan(sales_accounts,i,','));
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
rajdeep
Pyrite | Level 9

Hi Folks,

 

 

 

 

data have;
  input Emp_Accounts :$200.      Sales_Accounts :$200. ;
  put (_all_) (=);
datalines;
239684,239789,240173,240262,240760,241149,241175,241240,241249,241691,241750   240740,241149
run;

data want (drop=i);
  set have;
  length not_in_sales_accounts not_in_emp_accounts $200;
  do i=1 to countw(emp_accounts);
    if findw(sales_accounts,scan(emp_accounts,i,',')) = 0 then 
    not_in_sales_accounts=catx(',',not_in_sales_accounts,scan(emp_accounts,i,','));
  end;
  do i=1 to countw(sales_accounts);
    if findw(emp_accounts,scan(sales_accounts,i,',')) = 0 then 
    not_in_emp_accounts=catx(',',not_in_emp_accounts,scan(sales_accounts,i,','));
  end;
run;

The output is coming like below:

 

not_in_sales_accounts                                                                                                 not_in_emp_accounts
239684,239789,240173,240262,240760,241175,241240,241249,241691,241750     240740,241149

 

But you can see that 241149 is already present in emp_accounts.

 

Rather the output should come:

 not_in_sales_accounts                                                                                                 not_in_emp_accounts
239684,239789,240173,240262,240760,241175,241240,241249,241691,241750     240740

 

I hope you got my point. Can you please explain why it's catching the common account which is already present in both especially in this case?

 

Please help. Thanks in advance.

andreas_lds
Jade | Level 19

All char vars are padded with blanks, so the last value returned by scan will have lots of blanks at its end and findw searches for "241149          ", which is not in the list. Trimming the result of scan should solve the issue.

 

if findw(sales_accounts, trim(scan(emp_accounts, i, ','))) = 0 then do;
rajdeep
Pyrite | Level 9

Wow....Andreas_lds... It's working now......

 

 

rajdeep
Pyrite | Level 9

Hi mkeintz,

Just an additional help is needed. 

I just want to compare all the Depo_IDs for one group to Shop_IDs, if all the IDs matched then Comments should print "Matched" or else "Not Matched".

 

Pet_Names Depo_Count Depo_IDs Shop_Count Shop_IDs Comments Depo_Id_Not_In_Shop Shop_Id_Not_in_Depo
Dog 2 6346560,6354972 1 6354972 Matched 6346560  
Dog 1 6360302 2 6346560,6360302 Matched   6346560
Cat 1 6320478,6296089 1 6296088 Not Matched 6320478,6296089 6296088
Cat 3 6296088,6327084,6354370 3 6320478,6327084,6354370 Not Matched 6296088 6320478
Pigeon 1 6320478,6296089 0 . Not Matched 6320478,6296089 6296088
Pigeon 3 6296088,6327084,6354370 3 6320478,6327084,6354370 Not Matched 6296088 6320478
Parrot 1 6327084 1 6327084 Matched    

 

The above is the expected output.

Like, From Dog Category total 3 Depo_Ids are there and that has to get compared with 3 Shop_Ids from Dog Category. You can see if 3 vs 3 will be compared, token by token then it will be matched and Comments Column should go as "Matched".

 

Same like for Cat: 4 vs 4 will be compared

Pigeon: 4 vs 3 will be compared and so on.

In simple words, all the Depo_ids from one Pet_names group will be compared to Shop_ids from the same group, if matched then Comments=Matched else Not matched.

 

I have already done the same with almost 5 steps.

The below is my code:

FILENAME REFFILE 'C:\Pet_Depo.xlsx';
PROC IMPORT DATAFILE=REFFILE replace
	DBMS=XLSX
	OUT=WORK.Pet_Depo;
	GETNAMES=YES;
RUN;

proc sort data=Pet_Depo;
by Pet_Names;
run;

data grp_id_plt(drop=Comments);
set Pet_Depo ;
length Depo_Id_concat Shop_Id_concat $400;
by Pet_Names ;
retain Depo_Id_concat Shop_Id_concat ;
if first.Pet_Names then do ;
group_id =1;
Depo_Id_concat ='';
Shop_Id_concat ='';
end;
 do i=1 to countw(Depo_IDs);
    Depo_Id_concat=catx(';',strip(Depo_Id_concat),scan(Depo_IDs,i,','));
end;
do i=1 to countw(Shop_IDs);
	Shop_Id_concat =catx(";",strip(Shop_Id_concat),scan(Shop_IDs,i,','));
end;
group_id+1;
if last.Pet_Names then output;
run;

data Comments_holder;
length Comments $20;
set grp_id_plt;
set_con_shop = 'Matched';
do i=1 to countw(Shop_Id_concat,';');
if set_con_shop = 'Matched' then do;
con_shop = findw(Depo_Id_concat,strip(scan(Shop_Id_concat,i))) ;
if con_shop = 0 then set_con_shop = 'Not Matched';
end;
end;
set_con_depo = 'Matched';
do i=1 to countw(Depo_Id_concat,';');
if set_con_depo = 'Matched' then do;
con_depo = findw(Shop_Id_concat ,strip(scan(Depo_Id_concat,i))) ;
if con_depo = 0 then set_con_depo = 'Not Matched';
end;
end;
if set_con_shop = 'Matched' and set_con_depo ='Matched' then Comments = 'Matched';
else Comments = 'Not Matched';
drop set_con_shop set_con_depo con_shop con_depo i ;
run;

PROC SQL;
   CREATE TABLE WORK.Final AS 
   SELECT DISTINCT t1.Pet_Names, 
          
          t1.Depo_Count, 
          t1.Depo_IDs, 
          t1.Shop_Count, 
          t1.Shop_IDs, 
          t1.Depo_Id_Not_In_Shop, 
          t1.Shop_Id_Not_in_Depo, 
          t2.Comments
      FROM WORK.Pet_Depo t1
           LEFT JOIN WORK.Comments_HOLDER t2 ON (t1.Pet_Names = t2.Pet_Names);
QUIT;

Please suggest if any alternate way can I do the same or in lesser steps if it's possible suppose.

Thanks

ChrisNZ
Tourmaline | Level 20

The output does not match your description. You are not removing values, you add values.
Please state your goal more clearly.

rajdeep
Pyrite | Level 9

Hi Chrisnz,

 

I think the sample example in the above clearly showing what should be the expected output. Yeah, the code I had written that's somehow was not doing as expected. I want to check the one column values with another and if match found then I just want to remove the same and keep the remaining.

 

I hope that helps for your understanding. Please let me know if still there's any disconnect.

 

Thanks for your response.

rajdeep
Pyrite | Level 9

Hi Team,

 

Please help me if possible what I am missing in the above code.

 

Thanks in advance.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1165 views
  • 0 likes
  • 4 in conversation