Re: sql contains

hdg · Posted 04-30-2014 01:41 PM

I have a following dataset A

id	date	code
19075F106	4/29/2014	a10,axau10,a1010,axau1010
6219640	4/29/2014	n10,v10,x10,w10,n1010,v1010,x1010,w1010
6366007	4/29/2014	s10,v10,s1010,v1010
B63FY34	4/29/2014	a10,axau10,a1010,axau1010
25179M103	4/29/2014	a10,axau10,a1010,axau1010

Another dataset B

date	code
4/29/2014	a10
4/29/2014	a10
4/29/2014	a10
4/29/2014	x1010
4/29/2014	w10

I would like to filter dataset A where the code is contained in dataset B. The resulting dataset should be the below. I am trying to use SQL contains but its not working.

Thanks for your help!

id	date	code	newcode
19075F106	4/29/2014	a10,axau10,a1010,axau1010	a10
6219640	4/29/2014	n10,v10,x10,w10,n1010,v1010,x1010,w1010	x1010, w10
B63FY34	4/29/2014	a10,axau10,a1010,axau1010	a10
25179M103	4/29/2014	a10,axau10,a1010,axau1010	a10

Reeza · Posted 04-30-2014 01:45 PM

Post your code.

Haikuo · Posted 04-30-2014 02:42 PM

From I can see ( well I could be way off ), even though your problem seems to hint some sort of Cartesian Product approach, this is still not a good candidate for Proc SQL, because Catenate functions are NOT supported to operate across the rows. On the other hand, Data step can be more resourceful using Hash or Point=. The following is an old school approach using Point=;

data a;

infile cards truncover dlm='09'x;

informat id $20.

date mmddyy10.

code $50.;

input id date code;

format date mmddyy10.;

cards;

19075F106 4/29/2014 a10,axau10,a1010,axau1010

6219640 4/29/2014 n10,v10,x10,w10,n1010,v1010,x1010,w1010

6366007 4/29/2014 s10,v10,s1010,v1010

B63FY34 4/29/2014 a10,axau10,a1010,axau1010

25179M103 4/29/2014 a10,axau10,a1010,axau1010

;

data B;

infile cards truncover dlm='09'x;

informat date mmddyy10. code $8.;

input date code;

format date mmddyy10.;

cards;

4/29/2014 a10

4/29/2014 x1010

4/29/2014 w10

;

data _null_;

call symputx('nobs',nobs);

set b nobs=nobs;

stop;

run;

data want;

set a;

length new_code $ 100;

array _cd (&nobs)/*arbituary*/ $ 20;

do i=1 to &nobs;

set b (rename=code=b_code keep=code) point=i;

if findw(code,strip(b_code))>0 and strip(b_code) not in _cd

then _cd(i)=b_code;

end;

new_code=catx(',', of _cd(*));

if not missing (new_code);

drop _cd: b_code;

run;

Haikuo

Ksharp · Posted 05-01-2014 05:57 AM

I don't think so. Cartesian Product of Data step can't compete SQL's at all. Yours is very very slowly .

data a;
infile cards truncover ;
informat   id $20.
          date mmddyy10.  
          code $50.;
input id date code;
format date mmddyy10.;
cards;
19075F106  4/29/2014  a10,axau10,a1010,axau1010
6219640    4/29/2014  n10,v10,x10,w10,n1010,v1010,x1010,w1010
6366007    4/29/2014  s10,v10,s1010,v1010
B63FY34    4/29/2014  a10,axau10,a1010,axau1010
25179M103  4/29/2014  a10,axau10,a1010,axau1010
;
 
data B;
infile cards truncover      ;
informat date mmddyy10.    code $8.;
input date code;
format date mmddyy10.;
cards;
4/29/2014  a10
4/29/2014  a10
4/29/2014  a10
4/29/2014  x1010
4/29/2014  w10
;
proc sort data=b out=bb nodupkey;by date code;run;
proc sql;
create table temp as 
 select a.*,bb.code as new_code
  from a,bb
   where a.date=bb.date and a.code contains strip(bb.code)
    order by id,date,code;
quit;
data want;
 set temp;
 by id date code;
 length _code $ 40;
 retain _code;
 _code=catx(',',_code,new_code);
 if last.code then do;output;call missing(_code);end;
 drop new_code;
run;

Xia Keshan

sql contains

Re: sql contains

Re: sql contains

Re: sql contains

Click image to register for webinar

Classroom Training Available!