BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nicholasamh
Calcite | Level 5
Hi, I am new to SAS code. I have like to remove those data that are consist of the same words to each other based on ID. Really appreciate someone can help.
Example:
ID Transport
1 Singapore Car A
1 Car A
1 UK
1 UK Van CA
2 Japan Airplane GA
2 Airplane GA
.......
Expected outcome:
ID Transport
1 Singapore Car A
1 UK Van CA
2 Japan Airplane GA
1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

@Nicholasamh Welcome to the SAS Community! 🙂

 

data have;
input ID Transport $ 3-103;
infile datalines truncover;
datalines;
1 Singapore Car A
1 Car A
1 UK
1 UK Van CA
2 Airplane GA
2 Japan Airplane GA
;

data want(drop=i j a);
    array _{9999} $ 100 _temporary_;
    
    do i=1 by 1 until (last.id);
        set have;
        by id;
        _[i]=Transport;
    end;

    do until (last.id);
        set have;
        by id;
        a=0;
        do j=1 to dim(_);
           a=max(a, (find(_[j], transport, 'it') & _[j] ne transport));
           if j=dim(_) & a=0 then output;
        end;
    end;
run;

Result:

 

ID  Transport
1   Singapore Car A
1   UK Van CA
2   Japan Airplane GA

View solution in original post

5 REPLIES 5
Patrick
Opal | Level 21

Welcome to the SAS communities. 

 

Your problem is not trivial and requires a bit more than novice level coding. I couldn't think of something simpler than below two coding options (one datastep, one SQL). 

data have;
  infile datalines truncover;
  input Group_ID Transport $40.;
  datalines;
1 Singapore Car A
1 Car A
1 UK
1 UK Van CA
2 Japan Airplane GA
2 Airplane GA
;

/* SAS datastep using Hash lookup */
data want(drop=_:);
  if _n_=1 then
    do;
      if 0 then set have(rename=(Transport=_Transport));
      dcl hash h1(dataset:'have(rename=(Transport=_Transport))', multidata:'y');
      h1.defineKey('Group_ID');
      h1.defineData('_Transport');
      h1.defineDone();
    end;
  set have;

  do while(h1.do_over() = 0);
    if Transport=_Transport then continue;
    if find(_Transport,Transport,'it')>0 then
      do;
        delete;
      end;
  end;
run;

proc print data=want;
run;

/* SQL using EXIST clause */
proc sql feedback;
/*  create table want2 as*/
    select 
      o.*
    from have o
    where not exists
      (
        select *
        from have i
        where
          i.group_id=o.group_id
          and i.transport ne o.transport
          and find(i.Transport,o.Transport,'it')>0
      )
    ;
quit;

If you don't understand the code then search in the SAS docu what you find about the Hash object and the SQL EXIST clause.

If you're new to SQL then not sure which of the two options is easier to understand for you. If you've got already some familiarity with SQL then it's likely the EXIST clause.

PeterClemmensen
Tourmaline | Level 20

@Nicholasamh Welcome to the SAS Community! 🙂

 

data have;
input ID Transport $ 3-103;
infile datalines truncover;
datalines;
1 Singapore Car A
1 Car A
1 UK
1 UK Van CA
2 Airplane GA
2 Japan Airplane GA
;

data want(drop=i j a);
    array _{9999} $ 100 _temporary_;
    
    do i=1 by 1 until (last.id);
        set have;
        by id;
        _[i]=Transport;
    end;

    do until (last.id);
        set have;
        by id;
        a=0;
        do j=1 to dim(_);
           a=max(a, (find(_[j], transport, 'it') & _[j] ne transport));
           if j=dim(_) & a=0 then output;
        end;
    end;
run;

Result:

 

ID  Transport
1   Singapore Car A
1   UK Van CA
2   Japan Airplane GA
s_lassen
Meteorite | Level 14

The simplest solution is probably something like this:

proc sql;
  create table want as 
  select * from have base where not exists(
    select * from have where id=base.id and Transport ne base.Transport 
    and indexw(Transport,trim(base.transport))>0
) ;
Ksharp
Super User
data have;
input ID Transport $ 3-103;
infile datalines truncover;
datalines;
1 Singapore Car A
1 Car A
1 UK
1 UK Van CA
2 Airplane GA
2 Japan Airplane GA
;
proc sql;
create table want as
select distinct a.*
 from have as a,have as b
  where a.id=b.id and a.Transport ne b.Transport 
and a.Transport contains strip(b.Transport);
quit;
Nicholasamh
Calcite | Level 5
Thanks for all the great solutions

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 644 views
  • 1 like
  • 5 in conversation