Hi everyone.
I'm new to SAS and I'm trying to solve the following problem.
I have a number of company names (>1000). And I have a number of "approved" companies (around 50), for which names are only substrings of the official name. I need to check if a company is approved. For example:
Company list: "ABC Inc.", "Western Something Group Inc.", "Goose and partners LLP"
Approved list: "ABC", "Goose"
It should return "ABC Inc." and "Goose and partners LLP". Or mark them as "TRUE" in the dataset.
I understand that for each company ( "ABC Inc.") I need to loop over the list of approved companies and check if a substring is in a company name (FIND function I guess). I would imagine that it should be done in a data step because it will loop over the Company list. But I can't put it all together.
I'd appreciate any ideas.
@Jack_Loktiev wrote:
Hi everyone.
I'm new to SAS and I'm trying to solve the following problem.
I have a number of company names (>1000). And I have a number of "approved" companies (around 50), for which names are only substrings of the official name. I need to check if a company is approved. For example:
Company list: "ABC Inc.", "Western Something Group Inc.", "Goose and partners LLP"
Approved list: "ABC", "Goose"
It should return "ABC Inc." and "Goose and partners LLP". Or mark them as "TRUE" in the dataset.
I understand that for each company ( "ABC Inc.") I need to loop over the list of approved companies and check if a substring is in a company name (FIND function I guess). I would imagine that it should be done in a data step because it will loop over the Company list. But I can't put it all together.
I'd appreciate any ideas.
Here is an example of doing something similar with a data set you should have access to so you can test the code.
data example; set sashelp.class; array nn (3) $ 5 _temporary_ ("Ma", "Ja", "Zz"); do i=1 to dim(nn); if find(name,strip(nn[i]),'i')>0 then do; found=1; match=nn[i]; leave; end; end; drop i; run;
The array NN is temporary, meaning no values are kept in the data. You must have an index number that matches the number of values to search and length, the $ 5 in this case, needs to be as long as the longest value. Then place each value in the ( ) of the array definition.
The STRIP in the Find function call is because SAS will use the full length of the variable with trailing blanks to search for otherwise. The 'i' is to ignore case.
The leave quits searching when the first match is found.
SAS treats a numeric value of 1 as true and 0 as false.
You basically need to use a DO loop to check for each word.
Here is an example:
data have;
input company $50.;
cards;
ABC Inc.
Western Something Group Inc.
Goose and partners LLP
;
data want;
set have;
array list [2] $20 _temporary_ ("ABC" "Goose");
do index=1 to dim(list) until(found);
found = findw(company,strip(list[index]));
end;
if not found then put 'Deleted: ' company;
else output;
run;
1072 data want; 1073 set have; 1074 array list [2] $20 _temporary_ ("ABC" "Goose"); 1075 do index=1 to dim(list) until(found); 1076 found = findw(company,strip(list[index])); 1077 end; 1078 if not found then put 'Deleted: ' company; 1079 else output; 1080 run; Deleted: Western Something Group Inc. NOTE: There were 3 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 2 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.01 seconds
@Jack_Loktiev wrote:
Hi everyone.
I'm new to SAS and I'm trying to solve the following problem.
I have a number of company names (>1000). And I have a number of "approved" companies (around 50), for which names are only substrings of the official name. I need to check if a company is approved. For example:
Company list: "ABC Inc.", "Western Something Group Inc.", "Goose and partners LLP"
Approved list: "ABC", "Goose"
It should return "ABC Inc." and "Goose and partners LLP". Or mark them as "TRUE" in the dataset.
I understand that for each company ( "ABC Inc.") I need to loop over the list of approved companies and check if a substring is in a company name (FIND function I guess). I would imagine that it should be done in a data step because it will loop over the Company list. But I can't put it all together.
I'd appreciate any ideas.
Here is an example of doing something similar with a data set you should have access to so you can test the code.
data example; set sashelp.class; array nn (3) $ 5 _temporary_ ("Ma", "Ja", "Zz"); do i=1 to dim(nn); if find(name,strip(nn[i]),'i')>0 then do; found=1; match=nn[i]; leave; end; end; drop i; run;
The array NN is temporary, meaning no values are kept in the data. You must have an index number that matches the number of values to search and length, the $ 5 in this case, needs to be as long as the longest value. Then place each value in the ( ) of the array definition.
The STRIP in the Find function call is because SAS will use the full length of the variable with trailing blanks to search for otherwise. The 'i' is to ignore case.
The leave quits searching when the first match is found.
SAS treats a numeric value of 1 as true and 0 as false.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.