Dear Community members,
I have a dataset where I need to get all the hashed data into a macro variable & search that in original dataset. I am giving sample data below. I could do the code for one macro variable (hash_list1) , but not the other one (hash_list2). Can anyone help me how to write the code generically so that I can use the same code for hash_list1 & hash_list2. For example, you can use Macro Parameters so that this code can be used anywhere. Any help with this is appreciated.
I have coded so that I can do it for hash_list1 and I don't want to repeat the same code for hash_list2. Could someone help me to write the code? Thanks in advance !!!
options symbolgen mlogic mprint;
/* The total dataset */
data FullList;
do List = 1 to 1000000;
hash_list=upcase(put(sha256(strip(put(list,best.))||"d8pZpXvS0Q"),$hex64.));
output;
end;
run;
/* create hash_list1 & hash_list2 macros*/
data ValueList1;
call streaminit(1000000);
do i = 1 to 10000;
list1 = rand("integer", 100, 1000000);
hash_list1=upcase(put(sha256(strip(put(list1,best.))||"d8pZpXvS0Q"),$hex64.));
list2 = rand("integer", 1000, 1000000);
hash_list2=upcase(put(sha256(strip(put(list2,best.))||"d8pZpXvS0Q"),$hex64.));
output;
end;
run;
/* The number of records of the macro dataset*/
data ValueList2;
set ValueList1;
Count + 1;
call symputx('Count',Count);
run;
%put &Count.;
/* Break long macro into bins */
data _null_;
NumberBins=floor(&Count./500);
call symputx('NumberBins',NumberBins);
run;
%put &NumberBins. ;
/* get the values into macro variables */
%macro AddToBins();
%do BinCount = 0 %to &NumberBins.;
data _null_;
L=cats('L',&BinCount.);
call symputx('L',L);
run;
%global &L.;
proc sql noprint;
select "'"||strip(hash_list1)||"'" into :&L. separated by ','
from ValueList2
where &BinCount.*500 <= Count <= (&BinCount.*500)+(500-1) ;
quit;
%end;
%mend AddToBins;
%AddToBins;
%put &L0.***&L20. ;
/* code to include for the split macro variables*/
%macro splits (n_splits);
%local k;
%do k = 1 %to &n_splits;
hash_list in (&&L&k)
%if &k < &n_splits %then or;
%end;
%mend splits;
/* Search from original data to get the values from macro variables*/
data SearchResult2;
set FullList;
where(%splits (&NumberBins.));
run;
Why do you need the values in a macro variable? Why not just leave the values in a dataset and use the dataset in your query?
are you putting this list into something like this:
hashedVariable in (&listOfHashes.)
?
Bart
You need to watch out that the generated SQL is not too long for the remote database.
The easiest way to put 10 pounds in the the 5 pound bag of a macro variable is to use a macro variable that references other macro variables.
Consider this example:
data _null_;
call symputx('v1','A B C');
call symputx('v2','D E F');
call symputx('both','&v1 &v2');
run;
%put &=both;
828 %put &=both; BOTH=A B C D E F
So you can use a data step to generate both the series of shorter macro variables and also one extra macro variable that references all of the others.
Let's try it with the values of NAME from SASHELP.CLASS. So let's limit macro variables to a maximum length of 40 bytes so that we will need more than one to hold all of the names.
data _null_;
length one all $100 ;
retain all varnum;
do until(eof or (length(one) > 40));
set sashelp.class end=eof;
one = catx(',',one,quote(trim(name),"'"));
end;
varnum+1;
call symputx(cats('v',varnum),one);
all=catx(',',all,cats('&v',varnum));
if eof then call symputx('all',all);
run;
Result
934 %put &=v1; V1='Alfred','Alice','Barbara','Carol','Henry' 935 %put &=v2; V2='James','Jane','Janet','Jeffrey','John','Joyce' 936 %put &=v3; V3='Judy','Louise','Mary','Philip','Robert','Ronald' 937 %put &=v4; V4='Thomas','William' 938 %put ALL=%superq(all); ALL=&v1,&v2,&v3,&v4 939 %put &=all ; ALL='Alfred','Alice','Barbara','Carol','Henry','James','Jane','Janet','Jeffrey','John','Joyce','Judy','Louise','Mary','Philip','Robe rt','Ronald','Thomas','William'
Tom, It works well with small dataset. Could you implement the same for huge dataset with over 65000 bytes for a variable? I gave sample dataset so that what ever logic we could implement can be applied to that dataset and make sure it works. I need to break the string into multiple bins and can't concatenate into a single macro variable (as that would exceed the macro limit of 65,000 bytes). I separated it out into multiple strings with less than 65,000 bytes and query individual macro in the code as shown below. Thanks
%macro splits (n_splits);
%local k;
%do k = 1 %to &n_splits;
hash_list in (&&L&k)
%if &k < &n_splits %then or;
%end;
%mend splits;
@buddha_d wrote:
Tom, It works well with small dataset. Could you implement the same for huge dataset with over 65000 bytes for a variable? I gave sample dataset so that what ever logic we could implement can be applied to that dataset and make sure it works. I need to break the string into multiple bins and can't concatenate into a single macro variable (as that would exceed the macro limit of 65,000 bytes). I separated it out into multiple strings with less than 65,000 bytes and query individual macro in the code as shown below. Thanks
%macro splits (n_splits); %local k; %do k = 1 %to &n_splits; hash_list in (&&L&k) %if &k < &n_splits %then or; %end; %mend splits;
I cannot understand how this SPLITS macro has anything to do with your question of how to generate a long list of values to use in pass thru SQL query.
Are you saying there is some limit imposed by the remote database such that you cannot just use a single IN operator with the full list of values?
Like this:
hash_list in (&all)
But instead have to make a series of IN operators on shorter lists?
If so then adapt my code so that instead of generating ALL as
&v1,&v2...
It generates it as
hash_list in (&v1) or hash_list in (&v2) ...
And then instead of using this in your query:
where hash_list in (&all)
You could use
where &all
Note that when adapting my example to your actual data that you will need make lists of less than 32K bytes (the limit for a data step variable) instead the larger 64K limit on the length of a macro variable. Also notice how I stopped adding values when the length was longer than 40 instead of trying to hit exactly 100. It is better to make the individual list a little shorter than have a value get truncated.
ok, so I assume you cannot upload your data into temporary table, I had such situation with one oracle data base. My proposition is to use macroAray package (https://github.com/SASPAC/macroarray).
steps are:
1) enable SAS Packages Framework
2) install macroArray
package
3) load the package into your session
4) I commented out help info call
5) create temporary file with one list with single-quoted strings
6) use array
macro to create list of macro variables with hash_list
prefix
(the put statement prints out first 3 into the log)
7) use do_over
macro inside SQL pass through
here is the code:
filename packages "%sysfunc(pathname(work))"; /* setup WORK as temporary directory for packages */
filename SPFinit url "https://raw.githubusercontent.com/yabwon/SAS_PACKAGES/main/SPF/SPFinit.sas";
%include SPFinit; /* enable the framework */
%installPackage(macroArray) /* install a package */
%loadPackage(macroArray) /* load the package content into the SAS session */
/*
%helpPackage(macroArray, '%array()')
%helpPackage(macroArray, '%do_over()')
*/
data temp;
length hash_list $ 70;
set
ValueList1(keep=hash_list1 rename=(hash_list1=hash_list))
ValueList1(keep=hash_list2 rename=(hash_list2=hash_list))
;
hash_list=quote(strip(hash_list),"'");
run;
%array(ds = temp, vars = hash_list|, macarray=Y)
%put %hash_list(1) %hash_list(2) %hash_list(3);
options NOmprint;
proc sql;
connect to DATABASE(.....);
select * from connection to
(
select *
form TABLE
where hashedID in (
%do_over(hash_list, between=%str(,))
)
)
;
run;
/* this is just a test code so you could see how %do_over() works */
data test_do_over;
do i = %do_over(hash_list, between=%str(,)) ;
output;
end;
run;
If the hashedID in (...)
list is limited to some small number (Oracle has 1000 limitation) you can tray to use the WITH clause (https://www.geeksforgeeks.org/sql-with-clause/) and create "in code temporary table":
with tempTable as
(
%do_over(hash_list
, phrase=%nrstr(select %hash_list(&_i_) as hasID )
, between=%str( union all )
)
)
which will produce something like:
with tempTable as
(
select 'VJKAUY7HGVJZ7UTGHVBZJKF3UT8Y' as hasID
union all
select 'VJKNCVJKAW3457U8YGAUY7HGVJZ7' as hasID
union all
...
select 'KSDT89723T7GVSDXFJKZNVIE90TG' as hasID
)
Bart
The correct (and simple) way of doing this is to upload the lookup data to a temporary table in the DB and join there.
As @Tom wrote, do it in data step:
data want;
declare hash H();
H.defineKey("hash_list");
H.defineData("_N_");
H.defineDone();
/* put lists into hash table */
do until(eof);
set ValueList1(keep=hash_list1 hash_list2) end=eof;
_N_=H.add(key:hash_list1,data:1);
_N_=H.add(key:hash_list2,data:1);
drop hash_list1 hash_list2;
end;
/* search full dataset */
do until(endFull);
set Fulllist end=endFull;
if 0=H.check() then output;
end;
stop;
run;
Bart
yabwon, The actual data is in SQL and I am connecting through SAS using PassThrough facility. Hence, I could only store in macros and search those string against the Database. I published the sample data so that every one can understand the code easily. Thanks for asking.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.