I have a following dataset A
| id | date | code |
| 19075F106 | 4/29/2014 | a10,axau10,a1010,axau1010 |
| 6219640 | 4/29/2014 | n10,v10,x10,w10,n1010,v1010,x1010,w1010 |
| 6366007 | 4/29/2014 | s10,v10,s1010,v1010 |
| B63FY34 | 4/29/2014 | a10,axau10,a1010,axau1010 |
| 25179M103 | 4/29/2014 | a10,axau10,a1010,axau1010 |
Another dataset B
| date | code |
| 4/29/2014 | a10 |
| 4/29/2014 | a10 |
| 4/29/2014 | a10 |
| 4/29/2014 | x1010 |
| 4/29/2014 | w10 |
I would like to filter dataset A where the code is contained in dataset B. The resulting dataset should be the below. I am trying to use SQL contains but its not working.
Thanks for your help!
| id | date | code | newcode |
| 19075F106 | 4/29/2014 | a10,axau10,a1010,axau1010 | a10 |
| 6219640 | 4/29/2014 | n10,v10,x10,w10,n1010,v1010,x1010,w1010 | x1010, w10 |
| B63FY34 | 4/29/2014 | a10,axau10,a1010,axau1010 | a10 |
| 25179M103 | 4/29/2014 | a10,axau10,a1010,axau1010 | a10 |
Post your code.
From I can see ( well I could be way off ), even though your problem seems to hint some sort of Cartesian Product approach, this is still not a good candidate for Proc SQL, because Catenate functions are NOT supported to operate across the rows. On the other hand, Data step can be more resourceful using Hash or Point=. The following is an old school approach using Point=;
data a;
infile cards truncover dlm='09'x;
informat id $20.
date mmddyy10.
code $50.;
input id date code;
format date mmddyy10.;
cards;
19075F106 4/29/2014 a10,axau10,a1010,axau1010
6219640 4/29/2014 n10,v10,x10,w10,n1010,v1010,x1010,w1010
6366007 4/29/2014 s10,v10,s1010,v1010
B63FY34 4/29/2014 a10,axau10,a1010,axau1010
25179M103 4/29/2014 a10,axau10,a1010,axau1010
;
data B;
infile cards truncover dlm='09'x;
informat date mmddyy10. code $8.;
input date code;
format date mmddyy10.;
cards;
4/29/2014 a10
4/29/2014 a10
4/29/2014 a10
4/29/2014 x1010
4/29/2014 w10
;
data _null_;
call symputx('nobs',nobs);
set b nobs=nobs;
stop;
run;
data want;
set a;
length new_code $ 100;
array _cd (&nobs)/*arbituary*/ $ 20;
do i=1 to &nobs;
set b (rename=code=b_code keep=code) point=i;
if findw(code,strip(b_code))>0 and strip(b_code) not in _cd
then _cd(i)=b_code;
end;
new_code=catx(',', of _cd(*));
if not missing (new_code);
drop _cd: b_code;
run;
Haikuo
I don't think so. Cartesian Product of Data step can't compete SQL's at all. Yours is very very slowly .
data a;
infile cards truncover ;
informat id $20.
date mmddyy10.
code $50.;
input id date code;
format date mmddyy10.;
cards;
19075F106 4/29/2014 a10,axau10,a1010,axau1010
6219640 4/29/2014 n10,v10,x10,w10,n1010,v1010,x1010,w1010
6366007 4/29/2014 s10,v10,s1010,v1010
B63FY34 4/29/2014 a10,axau10,a1010,axau1010
25179M103 4/29/2014 a10,axau10,a1010,axau1010
;
data B;
infile cards truncover ;
informat date mmddyy10. code $8.;
input date code;
format date mmddyy10.;
cards;
4/29/2014 a10
4/29/2014 a10
4/29/2014 a10
4/29/2014 x1010
4/29/2014 w10
;
proc sort data=b out=bb nodupkey;by date code;run;
proc sql;
create table temp as
select a.*,bb.code as new_code
from a,bb
where a.date=bb.date and a.code contains strip(bb.code)
order by id,date,code;
quit;
data want;
set temp;
by id date code;
length _code $ 40;
retain _code;
_code=catx(',',_code,new_code);
if last.code then do;output;call missing(_code);end;
drop new_code;
run;
Xia Keshan
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.