BookmarkSubscribeRSS Feed
buddha_d
Pyrite | Level 9

Dear Community members,

              I have a dataset where I need to get all the hashed data  into a macro variable & search that in original dataset. I am giving sample data below. I could do the code for one macro variable (hash_list1) , but not the other one (hash_list2).  Can anyone help me how to write the code generically so that I can use the same code for hash_list1 & hash_list2. For example, you can use Macro Parameters so that this code can be used anywhere. Any help with  this is appreciated. 

           I have coded so that I can do it for hash_list1 and I don't want to repeat the same code for hash_list2. Could someone help me to write the code? Thanks in advance !!!

 

options symbolgen mlogic mprint; 


/* The total dataset */
data FullList; 
do List = 1 to 1000000; 
hash_list=upcase(put(sha256(strip(put(list,best.))||"d8pZpXvS0Q"),$hex64.));
output; 
end; 
run;

/*  create hash_list1 & hash_list2 macros*/

data ValueList1;
   call streaminit(1000000);
   do i = 1 to 10000;
   list1 = rand("integer", 100, 1000000);
   hash_list1=upcase(put(sha256(strip(put(list1,best.))||"d8pZpXvS0Q"),$hex64.));
   list2 = rand("integer", 1000, 1000000);
   hash_list2=upcase(put(sha256(strip(put(list2,best.))||"d8pZpXvS0Q"),$hex64.));

   output;
   end;
run;

/* The number of records of the macro dataset*/

data ValueList2;
set ValueList1;
Count + 1;
call symputx('Count',Count);
run;

%put &Count.;

/* Break long macro into bins */

data _null_;
NumberBins=floor(&Count./500);
call symputx('NumberBins',NumberBins);
run;

%put &NumberBins. ;

/* get the values into macro variables */

%macro AddToBins();
%do BinCount = 0 %to &NumberBins.;
data _null_;
L=cats('L',&BinCount.);
call symputx('L',L);
run;
%global &L.;
proc sql noprint;
select "'"||strip(hash_list1)||"'" into :&L. separated by ','
from ValueList2
where &BinCount.*500 <= Count <= (&BinCount.*500)+(500-1) ;
quit;
%end;
%mend AddToBins;
%AddToBins;


%put &L0.***&L20. ;

/* code to include for the split macro variables*/

%macro splits (n_splits);
   %local k;
   %do k = 1 %to &n_splits;
      hash_list in (&&L&k)
      %if &k < &n_splits %then or;
   %end;
%mend splits;


/* Search from original data to get the values from macro variables*/

data SearchResult2; 
set FullList; 
where(%splits (&NumberBins.)); 
run;

10 REPLIES 10
Tom
Super User Tom
Super User

Why do you need the values in a macro variable?  Why not just leave the values in a dataset and use the dataset in your query?

buddha_d
Pyrite | Level 9
Tom, I need to have them in macros as I am running query with Passthrough for SQL DB.
yabwon
Onyx | Level 15

are you putting this list into something like this:

hashedVariable in (&listOfHashes.)

?

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

You need to watch out that the generated SQL is not too long for the remote database.

 

The easiest way to put 10 pounds in the the 5 pound bag of a macro variable is to use a macro variable that references other macro variables.

 

Consider this example:

data _null_;
  call symputx('v1','A B C');
  call symputx('v2','D E F');
  call symputx('both','&v1 &v2');
run;
%put &=both;
828  %put &=both;
BOTH=A B C D E F

So you can use a data step to generate both the series of shorter macro variables and also one extra macro variable that references all of the others.

 

Let's try it with the values of NAME from SASHELP.CLASS.  So let's limit macro variables to a maximum length of 40 bytes so that we will need more than one to hold all of the names.

data _null_;
  length one all $100 ;
  retain all varnum;
  do until(eof or (length(one) > 40));
    set sashelp.class end=eof;
    one = catx(',',one,quote(trim(name),"'"));
  end;
  varnum+1;
  call symputx(cats('v',varnum),one);
  all=catx(',',all,cats('&v',varnum));
  if eof then call symputx('all',all);
run;

Result

934  %put &=v1;
V1='Alfred','Alice','Barbara','Carol','Henry'
935  %put &=v2;
V2='James','Jane','Janet','Jeffrey','John','Joyce'
936  %put &=v3;
V3='Judy','Louise','Mary','Philip','Robert','Ronald'
937  %put &=v4;
V4='Thomas','William'
938  %put ALL=%superq(all);
ALL=&v1,&v2,&v3,&v4
939  %put &=all ;
ALL='Alfred','Alice','Barbara','Carol','Henry','James','Jane','Janet','Jeffrey','John','Joyce','Judy','Louise','Mary','Philip','Robe
rt','Ronald','Thomas','William'

buddha_d
Pyrite | Level 9

Tom, It works well with small dataset. Could you implement the same for huge dataset with over 65000 bytes for a variable? I gave sample dataset so that what ever logic we could implement can be applied to that dataset and make sure it works. I need to break the string into multiple bins and can't concatenate into a single macro variable (as that would exceed the macro limit of 65,000 bytes). I separated it out into multiple strings with less than 65,000 bytes and query individual macro in the code as shown below. Thanks

%macro splits (n_splits);
   %local k;
   %do k = 1 %to &n_splits;
      hash_list in (&&L&k)
      %if &k < &n_splits %then or;
   %end;
%mend splits;
Tom
Super User Tom
Super User

@buddha_d wrote:

Tom, It works well with small dataset. Could you implement the same for huge dataset with over 65000 bytes for a variable? I gave sample dataset so that what ever logic we could implement can be applied to that dataset and make sure it works. I need to break the string into multiple bins and can't concatenate into a single macro variable (as that would exceed the macro limit of 65,000 bytes). I separated it out into multiple strings with less than 65,000 bytes and query individual macro in the code as shown below. Thanks

%macro splits (n_splits);
   %local k;
   %do k = 1 %to &n_splits;
      hash_list in (&&L&k)
      %if &k < &n_splits %then or;
   %end;
%mend splits;

I cannot understand how this SPLITS macro has anything to do with your question of how to generate a long list of values to use in pass thru SQL query. 

 

Are you saying there is some limit imposed by the remote database such that you cannot just use a single IN operator with the full list of values?

Like this:

hash_list in (&all)

But instead have to make a series of  IN operators on shorter lists?

If so then adapt my code so that instead of generating ALL as 

&v1,&v2...

It generates it as 

hash_list in (&v1) or hash_list in (&v2) ...

And then instead of using this in your query:

where hash_list in (&all)

You could use

where &all

Note that when adapting my example to your actual data that you will need make lists of less than 32K bytes (the limit for a data step variable) instead the larger 64K limit on the length of a macro variable.  Also notice how I stopped adding values when the length was longer than 40 instead of trying to hit exactly 100.  It is better to make the individual list a little shorter than have a value get truncated.

yabwon
Onyx | Level 15

ok, so I assume you cannot upload your data into temporary table, I had such situation with one oracle data base. My proposition is to use macroAray package (https://github.com/SASPAC/macroarray).

 

steps are:

1) enable SAS Packages Framework

2) install macroArray package

3) load the package into your session

4) I commented out help info call 

5) create temporary file with one list with single-quoted strings

 

6) use array macro to create list of macro variables with hash_list prefix

(the put statement prints out first 3 into the log)

 

7) use do_over macro inside SQL pass through

 

here is the code:

filename packages "%sysfunc(pathname(work))"; /* setup WORK as temporary directory for packages */
filename SPFinit url "https://raw.githubusercontent.com/yabwon/SAS_PACKAGES/main/SPF/SPFinit.sas";
%include SPFinit; /* enable the framework */

%installPackage(macroArray) /* install a package */
%loadPackage(macroArray)    /* load the package content into the SAS session */

/*
%helpPackage(macroArray, '%array()')
%helpPackage(macroArray, '%do_over()')
*/


data temp;
  length hash_list $ 70;
  set 
    ValueList1(keep=hash_list1 rename=(hash_list1=hash_list))   
    ValueList1(keep=hash_list2 rename=(hash_list2=hash_list))
  ;
  hash_list=quote(strip(hash_list),"'");
run;




%array(ds = temp, vars = hash_list|, macarray=Y)

%put %hash_list(1) %hash_list(2) %hash_list(3);

options NOmprint;
proc sql;
  connect to DATABASE(.....);


  select * from connection to
  (

    select * 
    form TABLE
    where hashedID in (
        %do_over(hash_list, between=%str(,))
        )
  )
  ; 
run;

/* this is just a test code so you could see how %do_over() works */
data test_do_over;
  do i = %do_over(hash_list, between=%str(,)) ;
    output;
  end;
run;

If the hashedID in (...) list is limited to some small number (Oracle has 1000 limitation) you can tray to use the WITH clause (https://www.geeksforgeeks.org/sql-with-clause/) and create "in code temporary table":

 

with tempTable as 
(

%do_over(hash_list
, phrase=%nrstr(select %hash_list(&_i_) as hasID )
, between=%str( union all )
)

)

which will produce something like:

with tempTable as 
(

select 'VJKAUY7HGVJZ7UTGHVBZJKF3UT8Y' as hasID 
union all
select 'VJKNCVJKAW3457U8YGAUY7HGVJZ7' as hasID 
union all
...
select 'KSDT89723T7GVSDXFJKZNVIE90TG' as hasID 

)

 

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

As @Tom wrote, do it in data step:

data want;
  declare hash H();
  H.defineKey("hash_list");
  H.defineData("_N_");
  H.defineDone();

  /* put lists into hash table */
  do until(eof);
    set ValueList1(keep=hash_list1 hash_list2) end=eof;
    _N_=H.add(key:hash_list1,data:1);
    _N_=H.add(key:hash_list2,data:1);
    drop hash_list1 hash_list2;
  end;

  /* search full dataset */
  do until(endFull);
    set Fulllist end=endFull;
    if 0=H.check() then output;
  end;

  stop;
run;

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



buddha_d
Pyrite | Level 9

yabwon, The actual data is in SQL and I am connecting through SAS using PassThrough facility. Hence, I could only store in macros and search those string against the Database. I published the sample data so that every one can understand the code easily. Thanks for asking. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 517 views
  • 2 likes
  • 4 in conversation