<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Search  var with best match name in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970508#M377085</link>
    <description>&lt;P&gt;Any time someone asks for "best" solutions I tend to ask "what is the measure used to determine best".&lt;/P&gt;
&lt;P&gt;Typically programming code looks at minimizing time of execution, time to program or resources used such as memory, storage space or network traffic.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have a data set with the names of the variables such as shown then you might consider use of one of the spelling distance functions such as COMPGED or SPEDIS to get a score for similarity to your search name.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;An example:&lt;/P&gt;
&lt;PRE&gt;proc contents data=sashelp.cars;
ods output variables=variables_List;
run;


data search; 
   set variables_list;
   Tofind = 'mpg';
   distance_score = compged(upcase(tofind),upcase(variable));
   keep tofind variable distance_score label;
run;

proc sort data=search;
   by distance_score;
run;&lt;/PRE&gt;
&lt;P&gt;The sort would get the typical "closest" matches at the start of the data set.&lt;/P&gt;
&lt;P&gt;I use the UPCASE versions because the functions used will return a different score based on case of the letters.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since variable names may be stored in mixed case but functionally there would be no difference for most other code between names of "mpg" "Mpg" "mPG" etc then the singe case for&amp;nbsp;&lt;STRONG&gt;any comparison&lt;/STRONG&gt; makes more sense.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that I also picked a variable request that might have 2 "correct" choices. To demonstrate that you may still require another manual step.&lt;/P&gt;
&lt;P&gt;Note that this will return 0 for the distance for a perfect match but the other values may be quite a bit different.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Experiment with the related functions and read the documentation.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might try adding a variable such a Really_long_mpg_variable to see how this scoring approach works.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Having dealt with a number of searches matching peoples' names without access the fancier SAS packages, I used an approach of look for an EXACT match first , Upcase(tofind)=Upcase(variable) and only bothered with the score if there was not a match.&lt;/P&gt;
&lt;P&gt;You may want to investigate a search for is one a substring of the other as well. See the result for the Really_long_mpg_variable suggested. The INDEX function might be useful for that.&lt;/P&gt;</description>
    <pubDate>Thu, 10 Jul 2025 07:57:57 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2025-07-10T07:57:57Z</dc:date>
    <item>
      <title>Search  var with best match name</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970507#M377084</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;
&lt;P&gt;Lets say I have list of column names that I am looking for .&lt;/P&gt;
&lt;P&gt;I want to search these col names in SASHELP.CARS data set.&lt;/P&gt;
&lt;P&gt;The problem that sometimes the var name I search doesnt exist in sashelp.cars .&lt;/P&gt;
&lt;P&gt;Then, my task is - Find the best candidate var name that&amp;nbsp; has highest matching to var name I am looking for.&lt;/P&gt;
&lt;P&gt;For example:&lt;/P&gt;
&lt;P&gt;For field&amp;nbsp;CAR_Model best candidate is CAR&lt;/P&gt;
&lt;P&gt;For field&amp;nbsp;MSRP&amp;nbsp; best candidate is&amp;nbsp;MSRP (Here have&amp;nbsp; full matching)&lt;/P&gt;
&lt;P&gt;For field&amp;nbsp;CarEngineSize&amp;nbsp; best candidate is&amp;nbsp;EngineSize&lt;/P&gt;
&lt;P&gt;and so on&lt;/P&gt;
&lt;P&gt;what is the way to do it please&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The want data set will have 2 columns- var_search, var_best_candidate&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc contents data=sashelp.cars;
ods output variables=variables_List;
run;

Data Search_Fields;
input field $
Cylin
DriveTrain
CarEngineSize
Horsepower
InvoiceAmnt
Length
MPG_IN_City
MPG_IN_Highway
MSRP
Make
CAR_Model
Origin
CAR_Type
Weight
Wheelbase
;
Run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jul 2025 07:35:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970507#M377084</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2025-07-10T07:35:23Z</dc:date>
    </item>
    <item>
      <title>Re: Search  var with best match name</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970508#M377085</link>
      <description>&lt;P&gt;Any time someone asks for "best" solutions I tend to ask "what is the measure used to determine best".&lt;/P&gt;
&lt;P&gt;Typically programming code looks at minimizing time of execution, time to program or resources used such as memory, storage space or network traffic.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have a data set with the names of the variables such as shown then you might consider use of one of the spelling distance functions such as COMPGED or SPEDIS to get a score for similarity to your search name.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;An example:&lt;/P&gt;
&lt;PRE&gt;proc contents data=sashelp.cars;
ods output variables=variables_List;
run;


data search; 
   set variables_list;
   Tofind = 'mpg';
   distance_score = compged(upcase(tofind),upcase(variable));
   keep tofind variable distance_score label;
run;

proc sort data=search;
   by distance_score;
run;&lt;/PRE&gt;
&lt;P&gt;The sort would get the typical "closest" matches at the start of the data set.&lt;/P&gt;
&lt;P&gt;I use the UPCASE versions because the functions used will return a different score based on case of the letters.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since variable names may be stored in mixed case but functionally there would be no difference for most other code between names of "mpg" "Mpg" "mPG" etc then the singe case for&amp;nbsp;&lt;STRONG&gt;any comparison&lt;/STRONG&gt; makes more sense.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that I also picked a variable request that might have 2 "correct" choices. To demonstrate that you may still require another manual step.&lt;/P&gt;
&lt;P&gt;Note that this will return 0 for the distance for a perfect match but the other values may be quite a bit different.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Experiment with the related functions and read the documentation.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might try adding a variable such a Really_long_mpg_variable to see how this scoring approach works.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Having dealt with a number of searches matching peoples' names without access the fancier SAS packages, I used an approach of look for an EXACT match first , Upcase(tofind)=Upcase(variable) and only bothered with the score if there was not a match.&lt;/P&gt;
&lt;P&gt;You may want to investigate a search for is one a substring of the other as well. See the result for the Really_long_mpg_variable suggested. The INDEX function might be useful for that.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jul 2025 07:57:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970508#M377085</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2025-07-10T07:57:57Z</dc:date>
    </item>
    <item>
      <title>Re: Search  var with best match name</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970511#M377086</link>
      <description>&lt;P&gt;That's great,&lt;/P&gt;
&lt;P&gt;May you show a program that find for each var the best matching var?&lt;/P&gt;
&lt;P&gt;Do it all for all vars in one step?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc contents data=sashelp.cars;
ods output variables=variables_List;
run;

Data Search_Fields;
input field $20.;
cards;
Cylin
DriveTrain
CarEngineSize
Horsepower
InvoiceAmnt
Length
MPG_IN_City
MPG_IN_Highway
MSRP
Make
CAR_Model
Origin
CAR_Type
Weight
Wheelbase
;
Run;

%macro Find_Best_Fit_Var(Field);
data search; 
set variables_list;
Tofind ="&amp;amp;Field.";
distance_score = compged(upcase(tofind),upcase(variable));
keep tofind variable distance_score label;
run;
proc sql;
create table Want as
select  *
from search
order by distance_score
;
quit;
data Want_b;
set Want;
 if _N_ = 1 then output;
run;
%mend Find_Best_Fit_Var;
%Find_Best_Fit_Var (Field=CarEngineSize)
%Find_Best_Fit_Var (Field=Cylin)
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 10 Jul 2025 10:27:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970511#M377086</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2025-07-10T10:27:12Z</dc:date>
    </item>
    <item>
      <title>Re: Search  var with best match name</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970541#M377090</link>
      <description>&lt;P&gt;The data is small, so you could use PROC SQL to get a cartesian product and calculate the distance for each combination, something like:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql ;
  create table want as
  select variable
        ,field
        ,compged(upcase(field),upcase(variable)) as distance_score
  from search_fields,variables_list
  group by field
  having distance_score=min(distance_score)
  ;
quit ;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 10 Jul 2025 16:33:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970541#M377090</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2025-07-10T16:33:42Z</dc:date>
    </item>
    <item>
      <title>Re: Search  var with best match name</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970542#M377091</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;For field&amp;nbsp;CAR_Model best candidate is CAR&lt;/P&gt;
&lt;P&gt;For field&amp;nbsp;MSRP&amp;nbsp; best candidate is&amp;nbsp;MSRP (Here have&amp;nbsp; full matching)&lt;/P&gt;
&lt;P&gt;For field&amp;nbsp;CarEngineSize&amp;nbsp; best candidate is&amp;nbsp;EngineSize&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am skeptical about the usage of these functions in this case. I have never tried using them in your case of variable names, but it seems to me these tools are designed to be used on words or names, in which case it can mitigate the effect of spelling errors, or mitigate the effect of there being 2 different ways to spell a word such as "color" and "colour". But neither of those applies to variable names, which are not names or words; where variable names can be spelled differently because different words or abbreviations or ordering of the words can be used in the variable name. So maybe I'm wrong, but this doesn't seem like the right problem to use these tools on.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jul 2025 17:41:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970542#M377091</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2025-07-10T17:41:56Z</dc:date>
    </item>
    <item>
      <title>Re: Search  var with best match name</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970575#M377094</link>
      <description>&lt;P&gt;What if you want to find the closest match to X145, from a list of variables named X14,X15,X125,X147.X345,X545?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10892"&gt;@PaigeMiller&lt;/a&gt;&amp;nbsp;has said, would the use of spelling distance have any utility at all in this case?&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jul 2025 23:37:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970575#M377094</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2025-07-10T23:37:33Z</dc:date>
    </item>
    <item>
      <title>Re: Search  var with best match name</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970587#M377099</link>
      <description>&lt;P&gt;As ballarw showed you , there are several function in SAS to do such FUZZ matched. Like : compged() spelldis() complev() ........ But I would like to show my way/function :&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc contents data=sashelp.cars;
ods output variables=a;
run;
data a;
 set a(keep=Variable rename=(Variable=a));
run;

Data b;
input b $20.;
cards;
Cylin
DriveTrain
CarEngineSize
Horsepower
InvoiceAmnt
Length
MPG_IN_City
MPG_IN_Highway
MSRP
Make
CAR_Model
Origin
CAR_Type
Weight
Wheelbase
;
Run;


proc fcmp outlib=work.math.func;
function fuzz_match(a $,b $)  ;
length new_a new_b str_short str_long  $ 200;
new_a=kcompress(a,,'kad');
new_b=kcompress(b,,'kad');
if klength(a)&amp;lt;klength(b) then do;
 str_short=new_a;str_long=new_b;
end;
else do;
  str_short=new_b;str_long=new_a;
end;

array count{10} _temporary_  (0 0 0  0  0  0  0   0    0    0);  /*The count of one string ,two string , three string....*/
array weight{10} _temporary_ (2 4 8 16 32 64 128 256 512 1024);  /*The weight of one string ,two string , three string....*/
max_str_len=min(10,klength(a),klength(b));  /*Search the max length of str is 10*/
do len=1 to max_str_len;      /*Search one string ,two string , three string .......*/
 n=0;
 do start=1 to klength(str_short)+1-len;  /*Search sub-string in  long string*/
  if kfind(str_long,strip(ksubstr(str_short,start,len))) then n+1;
 end;
 count{len}=n; 
end;


sum=0;w=0;mean=0;
do k=1 to max_str_len;
  if count{k} ne 0 then do;sum+count{k}*weight{k}; w+weight{k};end;
end;
/*Calculated weight mean
if w=0 then mean=0;
 else mean=sum/w;
*/
if a=b then sum=constant('exactint');
return (sum); /*return weight sum or weight mean(mean)*/
endsub;
run;

options cmplib=work.math;
proc sql;
create table want as
select a,b,fuzz_match(a,b) as weight
 from  a, b
  group by a
   having weight=max(weight);
quit;


&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Ksharp_0-1752225169514.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/108296i9EED63438F9C62D1/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Ksharp_0-1752225169514.png" alt="Ksharp_0-1752225169514.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jul 2025 09:13:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970587#M377099</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-07-11T09:13:50Z</dc:date>
    </item>
    <item>
      <title>Re: Search  var with best match name</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970639#M377118</link>
      <description>&lt;P&gt;The solution provided worked 100% for me.&lt;/P&gt;
&lt;P&gt;My task was to find the best candidate using the code and then look with my eyes too verify it found logical candidate&amp;nbsp; and if not then i do it manually using my eyes .&lt;/P&gt;</description>
      <pubDate>Sat, 12 Jul 2025 05:53:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Search-var-with-best-match-name/m-p/970639#M377118</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2025-07-12T05:53:31Z</dc:date>
    </item>
  </channel>
</rss>

