DATA Step, Macro, Functions and more

question on substring

Accepted Solution Solved
Reply
Frequent Contributor
Frequent Contributor
Posts: 79
Accepted Solution

question on substring

I was wondering if someone could help me solve this problem maybe using sql (open to other solutions that are fast)

 

I have a table A 

 

code

35

10

20

50

3540

 

table B

 

code

35,90

36,3540

72

10,1010

 

I would like to filter codes in table B by those available in table A, only one code can be selected for each row.

Table b is separated by commas

 

The output I want is the below

 

table B

code          codenew

35,90          35

36,3540      3540

72

10,1010        10

 

 

Thanks so much!

H


Accepted Solutions
Solution
‎08-23-2016 10:42 AM
Respected Advisor
Posts: 3,899

Re: question on substring

Assuming the "codes" variables are character below could work as well

proc sql;
  create table want as
    select b.code_s, a.code
    from b left join a
    on find(b.code_s,a.code,'t')>=1
  ;
quit;

View solution in original post


All Replies
Super User
Posts: 17,868

Re: question on substring

I think a data step is better in this case.

 

1. Load table A into a temporary array

2. Loop through string searching all values of temporary array. What happens if multiple matches?

 

This is a fairly similar idea, except rather than comparing to values in the temporary array you're checking for equality and then you can create new variable as needed.

 

https://gist.github.com/statgeek/f052b5223fecca066b1f

Super Contributor
Posts: 255

Re: question on substring

Here is Hash approach. You can do this using array too. It is easy to get CODE instead of code_new. If you insist on this name, it is still possible.

 

data a;
input code;
datalines;
35
10
20
50
3540
;
run;

data b;
input code_s $20.;
datalines; 
35,90
36,3540
72
10,1010
;
run;

data want;
   if _n_ = 1 then do;
      if 0 then set a;
      declare hash h(dataset:'a');
      h.definekey('code');
      h.definedone();
   end;
   set b;
   nw = countw(code_s, ',');
   do i = 1 to nw;
      code = input(scan(code_s, i, ','), 8.);
      if h.find() = 0 then output;
   end;
drop i nw;
run;

proc print data = want;
run;
Respected Advisor
Posts: 4,654

Re: question on substring

A SQL solution could be:

 

proc sql;
create table want as
select b.code_s, a.code
from 
    b left join 
    a 
        on a.code = input(scan(b.code_s,1,","),best.) or
           a.code = input(scan(b.code_s,2,","),best.);
quit;

but join conditions involving the OR operator are notoriously inefficient in SAS/SQL. On the other hand, if you are willing to store codes as in your B table, you can't be very worried about efficiency Smiley Happy.

PG
Solution
‎08-23-2016 10:42 AM
Respected Advisor
Posts: 3,899

Re: question on substring

Assuming the "codes" variables are character below could work as well

proc sql;
  create table want as
    select b.code_s, a.code
    from b left join a
    on find(b.code_s,a.code,'t')>=1
  ;
quit;
Frequent Contributor
Frequent Contributor
Posts: 79

Re: question on substring

Thank you all for your help. I have a follow up on this question if my table B is

 

table B

 

code

a35,a90

a36,a3540

zz72

a10,x1010

 

 

 

The output I want is the below to only pick those where the codes are available in table A

 

table B

code          codenew

35,90          a35

36,3540      a3540

72

10,1010        a10

 

I tried with find and substring but I am not getting hte answer I want.

Super User
Posts: 17,868

Re: question on substring

Start a new question. Make sure to highlight how it differs and what doesn't work from current solution. 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 394 views
  • 4 likes
  • 5 in conversation