BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jack_Loktiev
Fluorite | Level 6

Hi everyone.

 

I'm new to SAS and I'm trying to solve the following problem.

I have a number of company names (>1000). And I have a number of "approved" companies (around 50), for which names are only substrings of the official name. I need to check if a company is approved. For example:

Company list: "ABC Inc.", "Western Something Group Inc.", "Goose and partners LLP"

Approved list: "ABC", "Goose"

It should return "ABC Inc." and "Goose and partners LLP". Or mark them as "TRUE" in the dataset.

 

I understand that for each company ( "ABC Inc.") I need to loop over the list of approved companies and check if a substring is in a company name (FIND function I guess). I would imagine that it should be done in a data step because it will loop over the Company list. But I can't put it all together. 

 

I'd appreciate any ideas. 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Jack_Loktiev wrote:

Hi everyone.

 

I'm new to SAS and I'm trying to solve the following problem.

I have a number of company names (>1000). And I have a number of "approved" companies (around 50), for which names are only substrings of the official name. I need to check if a company is approved. For example:

Company list: "ABC Inc.", "Western Something Group Inc.", "Goose and partners LLP"

Approved list: "ABC", "Goose"

It should return "ABC Inc." and "Goose and partners LLP". Or mark them as "TRUE" in the dataset.

 

I understand that for each company ( "ABC Inc.") I need to loop over the list of approved companies and check if a substring is in a company name (FIND function I guess). I would imagine that it should be done in a data step because it will loop over the Company list. But I can't put it all together. 

 

I'd appreciate any ideas. 


Here is an example of doing something similar with a data set you should have access to so you can test the code.

data example;
   set sashelp.class;
   array nn (3) $ 5 _temporary_ ("Ma", "Ja", "Zz");
   do i=1 to dim(nn);
      if find(name,strip(nn[i]),'i')>0 then do;
         found=1;
         match=nn[i];
         leave;
      end;
   end;
   drop i;
run;

The array NN is temporary, meaning no values are kept in the data. You must have an index number that matches the number of values to search and length, the $ 5 in this case, needs to be as long as the longest value. Then place each value in the ( ) of the array definition.

The STRIP in the Find function call is because SAS will use the full length of the variable with trailing blanks to search for otherwise. The 'i' is to ignore case.

The leave quits searching when the first match is found.

 

SAS treats a numeric value of 1 as true and 0 as false.

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

You basically need to use a DO loop to check for each word.

Here is an example:

data have;
  input company $50.;
cards;
ABC Inc.
Western Something Group Inc.
Goose and partners LLP
;

data want;
  set have;
  array list [2] $20 _temporary_ ("ABC" "Goose");
  do index=1 to dim(list) until(found);
    found = findw(company,strip(list[index]));
  end;
  if not found then put 'Deleted: ' company;
  else output;
run;
1072  data want;
1073    set have;
1074    array list [2] $20 _temporary_ ("ABC" "Goose");
1075    do index=1 to dim(list) until(found);
1076      found = findw(company,strip(list[index]));
1077    end;
1078    if not found then put 'Deleted: ' company;
1079    else output;
1080  run;

Deleted: Western Something Group Inc.
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
ballardw
Super User

@Jack_Loktiev wrote:

Hi everyone.

 

I'm new to SAS and I'm trying to solve the following problem.

I have a number of company names (>1000). And I have a number of "approved" companies (around 50), for which names are only substrings of the official name. I need to check if a company is approved. For example:

Company list: "ABC Inc.", "Western Something Group Inc.", "Goose and partners LLP"

Approved list: "ABC", "Goose"

It should return "ABC Inc." and "Goose and partners LLP". Or mark them as "TRUE" in the dataset.

 

I understand that for each company ( "ABC Inc.") I need to loop over the list of approved companies and check if a substring is in a company name (FIND function I guess). I would imagine that it should be done in a data step because it will loop over the Company list. But I can't put it all together. 

 

I'd appreciate any ideas. 


Here is an example of doing something similar with a data set you should have access to so you can test the code.

data example;
   set sashelp.class;
   array nn (3) $ 5 _temporary_ ("Ma", "Ja", "Zz");
   do i=1 to dim(nn);
      if find(name,strip(nn[i]),'i')>0 then do;
         found=1;
         match=nn[i];
         leave;
      end;
   end;
   drop i;
run;

The array NN is temporary, meaning no values are kept in the data. You must have an index number that matches the number of values to search and length, the $ 5 in this case, needs to be as long as the longest value. Then place each value in the ( ) of the array definition.

The STRIP in the Find function call is because SAS will use the full length of the variable with trailing blanks to search for otherwise. The 'i' is to ignore case.

The leave quits searching when the first match is found.

 

SAS treats a numeric value of 1 as true and 0 as false.

Jack_Loktiev
Fluorite | Level 6
Thank you #Tom and #ballardw. Both solutions work great. I marked #ballardw's because it just happens to be closer to my code.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 7378 views
  • 2 likes
  • 3 in conversation