Solved: Hash case insensitive in the key

bebess · Posted 08-14-2018 05:42 AM

/* is there a way to have upcase(key:) in order to not have case issues in the key variables */

/* with SQL, the solution is :

upcase(a.key)=upcase(b.key)

*/

data want (drop=rc);

/*iterate left data set*/

set have ;

/*declare variables from hash set*/

length rate_1 rc 8 ;

/*declare hash*/

if _n_=1 then

do;

declare hash hhh(dataset: "table_ref (keep=rate_1 BUSINESS_AREA )", multidata:'y');

hhh.DefineKey('BUSINESS_AREA');

hhh.DefineData(ALL:'YES');

hhh.DefineDone();

end;

do until(DerniereObservation);

set have end=DerniereObservation;

call missing(rate_1);

rc = hhh.find(key:BUSINESS_AREA);

if rc ne 0 then

do;

var =.;

output;

end;

do while (rc=0);

var =rate_1*10;

output;

rc = hhh.find_next();

end;

run;

Patrick · Posted 08-14-2018 08:49 AM

@bebess

Assuming this is about the casing of the actual values and not the variable names holding the values.

Any string operation will have some impact on performance - but changing the case to align your values is something you can't avoid.

Any read/write operation will impact on performance - you can code to minimize passes through the data. Below an approach how to do so.

data ref;
  set sashelp.class(keep=name age);
  name=upcase(name);
run;

data have;
  set sashelp.class(drop=age);
  name=propcase(name);
run;

data want(drop=_:);

  if _n_=1 then
    do;
      /* declare the hash */
      dcl hash h1();
      h1.defineKey('name');
      h1.defineData('age');
      h1.defineDone();
      /* load the hash */
      do until(last);
        set ref end=last;
        name=lowcase(name);
        h1.ref();
      end;
    end;

  set have;

  _name_lower=lowcase(name);
  _rc=h1.find(key:_name_lower);
run;

View solution in original post

RW9 · Posted 08-14-2018 06:03 AM

Can you not just before that step:

data have;
  set have;
  key=upcase(key);
run;

You might be able to do it in the same step as the hash, not sure as never used hash.

KachiM · Posted 08-14-2018 06:36 AM

You may up case the KEY before loading the hash table. Since there will be few up cases in the KEY, it will be slightly efficient to use low case for there will be less work for the function.

cheers

DATASP

ChrisNZ · Posted 08-14-2018 08:01 AM

It is not possible to do what you want.

If you can't upcase the values in the source table before loading it in the hash table, 2 options:

1 - upcase them as you go and load the hash table row by row with hhh.add()

2 - If there are only 2 possible cases, try to find twice: hhh.find() and hhh.find(key:upcase(BUSINESS_AREA))

Your question indicates messy data though; strive for clean data and you wan't have to worry about this kind of issue.

High-Performance SAS Coding - Third Edition

bebess · Posted 08-14-2018 08:40 AM

oh ok . it is not possible to apply the same as in SQL meaning upcase in the 2 tables directly :(.

1. so i need to modify before the 2 tables for all characters variables concerned . i did'nt want to do this ,i was thinking that it could be possible in the HASH ( more efficient ? ) ... i wanted to try something directly in the hash function.

2. there is more than 2 possibilities ,

hhh.find(key:upcase(BUSINESS_AREA)) only upcase in the second table ( table_ref2) , not in the set table .

if _n_=1 then

do;

declare hash hhh(dataset: "table_ref2 (keep=rate_1 BUSINESS_AREA )", multidata:'y');

hhh.DefineKey('BUSINESS_AREA');

hhh.DefineData(ALL:'YES');

hhh.DefineDone();

end;

do until(DerniereObservation);

set have2 end=DerniereObservation;

call missing(rate_1);

rc = hhh.find(key:BUSINESS_AREA);

rc2= hhh.find(key:lowcase(BUSINESS_AREA));

bebess · Posted 08-14-2018 08:24 AM

Thanks for your responses,

i have many character variables in the key ( here an example with 2 ) and the case issue concern the 2 tables ( the orignal table and the ref table i want to join ) .

for example i can have in the original table : All_buisness and in the ref table ALL_BUISNESS , so i need to apply either upcase/lowcase in the first but also in th second table

i found a solution by creating SAS datasets views ..... but do you know the impact in performance ... ( big datasets ...)

data table_ref2 / view=table_ref2;

set table_ref;

BUSINESS_AREA = lowcase(BUSINESS_AREA);

run;

data have2 / view=have2;

set have;

BUSINESS_AREA = lowcase(BUSINESS_AREA);

run;

data have3(drop=rc);

/*iterate left data set*/

set have2 ;

/*declare variables from hash set*/

length rate_1 rc 8 ;

/*declare hash*/

if _n_=1 then

do;

declare hash hhh(dataset: "table_ref2(keep=rate_1 BUSINESS_AREA TYPE_OF_POL )", multidata:'y');

hhh.DefineKey('BUSINESS_AREA','TYPE_OF_POL');

hhh.DefineData(ALL:'YES');

hhh.DefineDone();

end;

do until(DerniereObservation);

set have2 end=DerniereObservation;

call missing(rate_1);

rc = hhh.find(key:BUSINESS_AREA,key:TYPE_OF_POL);

if rc ne 0 then

do;

var =.;

output;

end;

do while (rc=0);

var =rate_1*10;

output;

rc = hhh.find_next();

end;

run;

ChrisNZ · Posted 08-14-2018 08:32 AM

If your are talking about the variable names, SAS is normally not case sensitive. So your code should run fine without modifications.

High-Performance SAS Coding - Third Edition

bebess · Posted 08-14-2018 08:42 AM

i am not talking avout SAS names but the values of the variables

ChrisNZ · Posted 08-14-2018 05:58 PM

>i am not talking avout SAS names but the values of the variables

This was not obvious from your confusing answer:

for example i can have in the original table : All_buisness and in the ref table ALL_BUISNESS , so i need to apply either upcase/lowcase in the first but also in th second table

High-Performance SAS Coding - Third Edition

Patrick · Posted 08-14-2018 08:49 AM

@bebess

Assuming this is about the casing of the actual values and not the variable names holding the values.

Any string operation will have some impact on performance - but changing the case to align your values is something you can't avoid.

Any read/write operation will impact on performance - you can code to minimize passes through the data. Below an approach how to do so.

data ref;
  set sashelp.class(keep=name age);
  name=upcase(name);
run;

data have;
  set sashelp.class(drop=age);
  name=propcase(name);
run;

data want(drop=_:);

  if _n_=1 then
    do;
      /* declare the hash */
      dcl hash h1();
      h1.defineKey('name');
      h1.defineData('age');
      h1.defineDone();
      /* load the hash */
      do until(last);
        set ref end=last;
        name=lowcase(name);
        h1.ref();
      end;
    end;

  set have;

  _name_lower=lowcase(name);
  _rc=h1.find(key:_name_lower);
run;

Hash case insensitive in the key

Re: Hash case insensitive in the key

Re: Hash case insensitive in the key

Re: Hash case insensitive in the key

Re: Hash case insensitive in the key

Re: Hash case insensitive in the key

Re: Hash case insensitive in the key

Re: Hash case insensitive in the key

Re: Hash case insensitive in the key

Re: Hash case insensitive in the key

Re: Hash case insensitive in the key

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away