@Nicholasamh Welcome to the SAS Community! 🙂
data have;
input ID Transport $ 3-103;
infile datalines truncover;
datalines;
1 Singapore Car A
1 Car A
1 UK
1 UK Van CA
2 Airplane GA
2 Japan Airplane GA
;
data want(drop=i j a);
array _{9999} $ 100 _temporary_;
do i=1 by 1 until (last.id);
set have;
by id;
_[i]=Transport;
end;
do until (last.id);
set have;
by id;
a=0;
do j=1 to dim(_);
a=max(a, (find(_[j], transport, 'it') & _[j] ne transport));
if j=dim(_) & a=0 then output;
end;
end;
run;
Result:
ID Transport 1 Singapore Car A 1 UK Van CA 2 Japan Airplane GA
Welcome to the SAS communities.
Your problem is not trivial and requires a bit more than novice level coding. I couldn't think of something simpler than below two coding options (one datastep, one SQL).
data have;
infile datalines truncover;
input Group_ID Transport $40.;
datalines;
1 Singapore Car A
1 Car A
1 UK
1 UK Van CA
2 Japan Airplane GA
2 Airplane GA
;
/* SAS datastep using Hash lookup */
data want(drop=_:);
if _n_=1 then
do;
if 0 then set have(rename=(Transport=_Transport));
dcl hash h1(dataset:'have(rename=(Transport=_Transport))', multidata:'y');
h1.defineKey('Group_ID');
h1.defineData('_Transport');
h1.defineDone();
end;
set have;
do while(h1.do_over() = 0);
if Transport=_Transport then continue;
if find(_Transport,Transport,'it')>0 then
do;
delete;
end;
end;
run;
proc print data=want;
run;
/* SQL using EXIST clause */
proc sql feedback;
/* create table want2 as*/
select
o.*
from have o
where not exists
(
select *
from have i
where
i.group_id=o.group_id
and i.transport ne o.transport
and find(i.Transport,o.Transport,'it')>0
)
;
quit;
If you don't understand the code then search in the SAS docu what you find about the Hash object and the SQL EXIST clause.
If you're new to SQL then not sure which of the two options is easier to understand for you. If you've got already some familiarity with SQL then it's likely the EXIST clause.
@Nicholasamh Welcome to the SAS Community! 🙂
data have;
input ID Transport $ 3-103;
infile datalines truncover;
datalines;
1 Singapore Car A
1 Car A
1 UK
1 UK Van CA
2 Airplane GA
2 Japan Airplane GA
;
data want(drop=i j a);
array _{9999} $ 100 _temporary_;
do i=1 by 1 until (last.id);
set have;
by id;
_[i]=Transport;
end;
do until (last.id);
set have;
by id;
a=0;
do j=1 to dim(_);
a=max(a, (find(_[j], transport, 'it') & _[j] ne transport));
if j=dim(_) & a=0 then output;
end;
end;
run;
Result:
ID Transport 1 Singapore Car A 1 UK Van CA 2 Japan Airplane GA
The simplest solution is probably something like this:
proc sql;
create table want as
select * from have base where not exists(
select * from have where id=base.id and Transport ne base.Transport
and indexw(Transport,trim(base.transport))>0
)
;
data have;
input ID Transport $ 3-103;
infile datalines truncover;
datalines;
1 Singapore Car A
1 Car A
1 UK
1 UK Van CA
2 Airplane GA
2 Japan Airplane GA
;
proc sql;
create table want as
select distinct a.*
from have as a,have as b
where a.id=b.id and a.Transport ne b.Transport
and a.Transport contains strip(b.Transport);
quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.