sql contains

Reply
Frequent Contributor
Frequent Contributor
Posts: 79

sql contains

I have a following dataset A

iddatecode
19075F1064/29/2014a10,axau10,a1010,axau1010
62196404/29/2014n10,v10,x10,w10,n1010,v1010,x1010,w1010
63660074/29/2014s10,v10,s1010,v1010
B63FY344/29/2014a10,axau10,a1010,axau1010
25179M1034/29/2014

a10,axau10,a1010,axau1010

Another dataset B

datecode
4/29/2014a10
4/29/2014a10
4/29/2014a10
4/29/2014x1010
4/29/2014w10

I would like to filter dataset A where the code is contained in dataset B. The resulting dataset should be the below. I am trying to use SQL contains but its not working.

Thanks for your help!

iddatecodenewcode
19075F1064/29/2014a10,axau10,a1010,axau1010a10
62196404/29/2014n10,v10,x10,w10,n1010,v1010,x1010,w1010x1010, w10
B63FY344/29/2014a10,axau10,a1010,axau1010a10
25179M1034/29/2014a10,axau10,a1010,axau1010a10
Super User
Posts: 19,789

Re: sql contains

Post your code.

Respected Advisor
Posts: 3,156

Re: sql contains

From I can see ( well I could be way off ), even though your problem seems to hint some sort of Cartesian Product approach,  this is still not a good candidate for Proc SQL, because Catenate functions are NOT supported to operate across the rows. On the other hand, Data step can be more resourceful using Hash or Point=. The following is an old school approach using Point=;

data a;

infile cards truncover dlm='09'x;

informat   id $20.

          date mmddyy10.  

          code $50.;

input id date code;

format date mmddyy10.;

cards;

19075F106  4/29/2014  a10,axau10,a1010,axau1010

6219640    4/29/2014  n10,v10,x10,w10,n1010,v1010,x1010,w1010

6366007    4/29/2014  s10,v10,s1010,v1010

B63FY34    4/29/2014  a10,axau10,a1010,axau1010

25179M103  4/29/2014  a10,axau10,a1010,axau1010

;

data B;

infile cards truncover dlm='09'x;

informat date mmddyy10.    code $8.;

input date code;

format date mmddyy10.;

cards;

4/29/2014  a10

4/29/2014  a10

4/29/2014  a10

4/29/2014  x1010

4/29/2014  w10

;

data _null_;

call symputx('nobs',nobs);

set b nobs=nobs;

stop;

run;

data want;

set a;

length new_code $ 100;

array _cd (&nobs)/*arbituary*/ $ 20;

do i=1 to &nobs;

       set b (rename=code=b_code keep=code) point=i;

        if findw(code,strip(b_code))>0 and strip(b_code) not in _cd

         then _cd(i)=b_code;

     end;

     new_code=catx(',', of _cd(*));

     if not missing (new_code);

     drop _cd: b_code;

run;

    

Haikuo

Super User
Posts: 10,028

Re: sql contains

I don't think so. Cartesian Product of Data step can't compete SQL's at all. Yours is very very slowly .

data a;
infile cards truncover ;
informat   id $20.
          date mmddyy10.  
          code $50.;
input id date code;
format date mmddyy10.;
cards;
19075F106  4/29/2014  a10,axau10,a1010,axau1010
6219640    4/29/2014  n10,v10,x10,w10,n1010,v1010,x1010,w1010
6366007    4/29/2014  s10,v10,s1010,v1010
B63FY34    4/29/2014  a10,axau10,a1010,axau1010
25179M103  4/29/2014  a10,axau10,a1010,axau1010
;
 
data B;
infile cards truncover      ;
informat date mmddyy10.    code $8.;
input date code;
format date mmddyy10.;
cards;
4/29/2014  a10
4/29/2014  a10
4/29/2014  a10
4/29/2014  x1010
4/29/2014  w10
;
proc sort data=b out=bb nodupkey;by date code;run;
proc sql;
create table temp as 
 select a.*,bb.code as new_code
  from a,bb
   where a.date=bb.date and a.code contains strip(bb.code)
    order by id,date,code;
quit;
data want;
 set temp;
 by id date code;
 length _code $ 40;
 retain _code;
 _code=catx(',',_code,new_code);
 if last.code then do;output;call missing(_code);end;
 drop new_code;
run;

Xia Keshan

Ask a Question
Discussion stats
  • 3 replies
  • 262 views
  • 0 likes
  • 4 in conversation