I have a following dataset A
id | date | code |
19075F106 | 4/29/2014 | a10,axau10,a1010,axau1010 |
6219640 | 4/29/2014 | n10,v10,x10,w10,n1010,v1010,x1010,w1010 |
6366007 | 4/29/2014 | s10,v10,s1010,v1010 |
B63FY34 | 4/29/2014 | a10,axau10,a1010,axau1010 |
25179M103 | 4/29/2014 | a10,axau10,a1010,axau1010 |
Another dataset B
date | code |
4/29/2014 | a10 |
4/29/2014 | a10 |
4/29/2014 | a10 |
4/29/2014 | x1010 |
4/29/2014 | w10 |
I would like to filter dataset A where the code is contained in dataset B. The resulting dataset should be the below. I am trying to use SQL contains but its not working.
Thanks for your help!
id | date | code | newcode |
19075F106 | 4/29/2014 | a10,axau10,a1010,axau1010 | a10 |
6219640 | 4/29/2014 | n10,v10,x10,w10,n1010,v1010,x1010,w1010 | x1010, w10 |
B63FY34 | 4/29/2014 | a10,axau10,a1010,axau1010 | a10 |
25179M103 | 4/29/2014 | a10,axau10,a1010,axau1010 | a10 |
Post your code.
From I can see ( well I could be way off ), even though your problem seems to hint some sort of Cartesian Product approach, this is still not a good candidate for Proc SQL, because Catenate functions are NOT supported to operate across the rows. On the other hand, Data step can be more resourceful using Hash or Point=. The following is an old school approach using Point=;
data a;
infile cards truncover dlm='09'x;
informat id $20.
date mmddyy10.
code $50.;
input id date code;
format date mmddyy10.;
cards;
19075F106 4/29/2014 a10,axau10,a1010,axau1010
6219640 4/29/2014 n10,v10,x10,w10,n1010,v1010,x1010,w1010
6366007 4/29/2014 s10,v10,s1010,v1010
B63FY34 4/29/2014 a10,axau10,a1010,axau1010
25179M103 4/29/2014 a10,axau10,a1010,axau1010
;
data B;
infile cards truncover dlm='09'x;
informat date mmddyy10. code $8.;
input date code;
format date mmddyy10.;
cards;
4/29/2014 a10
4/29/2014 a10
4/29/2014 a10
4/29/2014 x1010
4/29/2014 w10
;
data _null_;
call symputx('nobs',nobs);
set b nobs=nobs;
stop;
run;
data want;
set a;
length new_code $ 100;
array _cd (&nobs)/*arbituary*/ $ 20;
do i=1 to &nobs;
set b (rename=code=b_code keep=code) point=i;
if findw(code,strip(b_code))>0 and strip(b_code) not in _cd
then _cd(i)=b_code;
end;
new_code=catx(',', of _cd(*));
if not missing (new_code);
drop _cd: b_code;
run;
Haikuo
I don't think so. Cartesian Product of Data step can't compete SQL's at all. Yours is very very slowly .
data a; infile cards truncover ; informat id $20. date mmddyy10. code $50.; input id date code; format date mmddyy10.; cards; 19075F106 4/29/2014 a10,axau10,a1010,axau1010 6219640 4/29/2014 n10,v10,x10,w10,n1010,v1010,x1010,w1010 6366007 4/29/2014 s10,v10,s1010,v1010 B63FY34 4/29/2014 a10,axau10,a1010,axau1010 25179M103 4/29/2014 a10,axau10,a1010,axau1010 ; data B; infile cards truncover ; informat date mmddyy10. code $8.; input date code; format date mmddyy10.; cards; 4/29/2014 a10 4/29/2014 a10 4/29/2014 a10 4/29/2014 x1010 4/29/2014 w10 ; proc sort data=b out=bb nodupkey;by date code;run; proc sql; create table temp as select a.*,bb.code as new_code from a,bb where a.date=bb.date and a.code contains strip(bb.code) order by id,date,code; quit; data want; set temp; by id date code; length _code $ 40; retain _code; _code=catx(',',_code,new_code); if last.code then do;output;call missing(_code);end; drop new_code; run;
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.