I have a following dataset A
id | date | code |
19075F106 | 4/29/2014 | a10,axau10,a1010,axau1010 |
6219640 | 4/29/2014 | n10,v10,x10,w10,n1010,v1010,x1010,w1010 |
6366007 | 4/29/2014 | s10,v10,s1010,v1010 |
B63FY34 | 4/29/2014 | a10,axau10,a1010,axau1010 |
25179M103 | 4/29/2014 | a10,axau10,a1010,axau1010 |
Another dataset B
date | code |
4/29/2014 | a10 |
4/29/2014 | a10 |
4/29/2014 | a10 |
4/29/2014 | x1010 |
4/29/2014 | w10 |
I would like to filter dataset A where the code is contained in dataset B. The resulting dataset should be the below. I am trying to use SQL contains but its not working.
Thanks for your help!
id | date | code | newcode |
19075F106 | 4/29/2014 | a10,axau10,a1010,axau1010 | a10 |
6219640 | 4/29/2014 | n10,v10,x10,w10,n1010,v1010,x1010,w1010 | x1010, w10 |
B63FY34 | 4/29/2014 | a10,axau10,a1010,axau1010 | a10 |
25179M103 | 4/29/2014 | a10,axau10,a1010,axau1010 | a10 |
Post your code.
From I can see ( well I could be way off ), even though your problem seems to hint some sort of Cartesian Product approach, this is still not a good candidate for Proc SQL, because Catenate functions are NOT supported to operate across the rows. On the other hand, Data step can be more resourceful using Hash or Point=. The following is an old school approach using Point=;
data a;
infile cards truncover dlm='09'x;
informat id $20.
date mmddyy10.
code $50.;
input id date code;
format date mmddyy10.;
cards;
19075F106 4/29/2014 a10,axau10,a1010,axau1010
6219640 4/29/2014 n10,v10,x10,w10,n1010,v1010,x1010,w1010
6366007 4/29/2014 s10,v10,s1010,v1010
B63FY34 4/29/2014 a10,axau10,a1010,axau1010
25179M103 4/29/2014 a10,axau10,a1010,axau1010
;
data B;
infile cards truncover dlm='09'x;
informat date mmddyy10. code $8.;
input date code;
format date mmddyy10.;
cards;
4/29/2014 a10
4/29/2014 a10
4/29/2014 a10
4/29/2014 x1010
4/29/2014 w10
;
data _null_;
call symputx('nobs',nobs);
set b nobs=nobs;
stop;
run;
data want;
set a;
length new_code $ 100;
array _cd (&nobs)/*arbituary*/ $ 20;
do i=1 to &nobs;
set b (rename=code=b_code keep=code) point=i;
if findw(code,strip(b_code))>0 and strip(b_code) not in _cd
then _cd(i)=b_code;
end;
new_code=catx(',', of _cd(*));
if not missing (new_code);
drop _cd: b_code;
run;
Haikuo
I don't think so. Cartesian Product of Data step can't compete SQL's at all. Yours is very very slowly .
data a; infile cards truncover ; informat id $20. date mmddyy10. code $50.; input id date code; format date mmddyy10.; cards; 19075F106 4/29/2014 a10,axau10,a1010,axau1010 6219640 4/29/2014 n10,v10,x10,w10,n1010,v1010,x1010,w1010 6366007 4/29/2014 s10,v10,s1010,v1010 B63FY34 4/29/2014 a10,axau10,a1010,axau1010 25179M103 4/29/2014 a10,axau10,a1010,axau1010 ; data B; infile cards truncover ; informat date mmddyy10. code $8.; input date code; format date mmddyy10.; cards; 4/29/2014 a10 4/29/2014 a10 4/29/2014 a10 4/29/2014 x1010 4/29/2014 w10 ; proc sort data=b out=bb nodupkey;by date code;run; proc sql; create table temp as select a.*,bb.code as new_code from a,bb where a.date=bb.date and a.code contains strip(bb.code) order by id,date,code; quit; data want; set temp; by id date code; length _code $ 40; retain _code; _code=catx(',',_code,new_code); if last.code then do;output;call missing(_code);end; drop new_code; run;
Xia Keshan
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.