I have a list of IDs in a SAS dataset, which I uploaded from Excel. The IDs are alphanumeric, characters, such as 012345ABCDEF.
I have 200000 observations in this IDs SAS dataset. I need to accomplish 2 things:
1. Take 10000 of the IDs at a time, surround each ID with single quotes, put a comma between them, and put them into a single text string variable. So they would look like this:
'012345ABCDEF', '024680DEFGHI'
2. I'll then take that variable and use it in my passthru query IN statement, such as:
proc sql;
CONNECT TO ORACLE as con1
(connection details here);
create table WANT as
select * from connection to con1
(
select * from oracle_table where ID in (string variable I created above)
);
I'll repeat steps 1 and 2 above until I iterate through all 200K of my SAS table IDs.
First thing is you should probably talk to the owner of the Oracle database and see if they can grant you permission to upload the list into a temporary table in Oracle. That will go much faster.
Second you have assumed that you need to make a single string variable. Instead what you are asking is to create a PROGRAM. The easiest place to store a PROGRAM is in a FILE.
Let's assume you have a dataset named HAVE with the set of unique values of the variable ID that you want to use. You can then use that to write a program that has a series of CREATE TABLE statements. Once you have the program you then use %INCLUDE to run it.
So something like this:
%let limit=10000;
filename code temp;
data _null_;
set have end=eof ;
file code lrecl=80 ;
length qid $20 ;
qid=quote(trim(id),"'");
if mod(_n_,&limit)=1 then do;
fileno+1;
put 'create table WANT' fileno 'as '
/ 'select * from connection to con1 ('
/ 'select * from oracle_table where ID in '
/ '(' @
;
end;
else put ',' @;
put qid @;
if eof or mod(_n_,&limit)=0 then do;
put ')' / ');' ;
end;
if eof then call symputx('nfiles',fileno);
run;
Which you could run using code like this:
proc sql;
connect to oracle as con1 (connection details here);
%include code / source2;
quit;
And you could then put the multiple files back together with a data step like this:
data want;
set want1-want&nfiles;
run;
Here is an example of what the generated program would look like for a list of 100 id values split into a limit of 40 per dataset.
create table WANT1 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000001' ,'000000000002' ,'000000000003' ,'000000000004' ,'000000000005'
,'000000000006' ,'000000000007' ,'000000000008' ,'000000000009' ,'000000000010'
,'000000000011' ,'000000000012' ,'000000000013' ,'000000000014' ,'000000000015'
,'000000000016' ,'000000000017' ,'000000000018' ,'000000000019' ,'000000000020'
,'000000000021' ,'000000000022' ,'000000000023' ,'000000000024' ,'000000000025'
,'000000000026' ,'000000000027' ,'000000000028' ,'000000000029' ,'000000000030'
,'000000000031' ,'000000000032' ,'000000000033' ,'000000000034' ,'000000000035'
,'000000000036' ,'000000000037' ,'000000000038' ,'000000000039' ,'000000000040'
)
);
create table WANT2 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000041' ,'000000000042' ,'000000000043' ,'000000000044' ,'000000000045'
,'000000000046' ,'000000000047' ,'000000000048' ,'000000000049' ,'000000000050'
,'000000000051' ,'000000000052' ,'000000000053' ,'000000000054' ,'000000000055'
,'000000000056' ,'000000000057' ,'000000000058' ,'000000000059' ,'000000000060'
,'000000000061' ,'000000000062' ,'000000000063' ,'000000000064' ,'000000000065'
,'000000000066' ,'000000000067' ,'000000000068' ,'000000000069' ,'000000000070'
,'000000000071' ,'000000000072' ,'000000000073' ,'000000000074' ,'000000000075'
,'000000000076' ,'000000000077' ,'000000000078' ,'000000000079' ,'000000000080'
)
);
create table WANT3 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000081' ,'000000000082' ,'000000000083' ,'000000000084' ,'000000000085'
,'000000000086' ,'000000000087' ,'000000000088' ,'000000000089' ,'000000000090'
,'000000000091' ,'000000000092' ,'000000000093' ,'000000000094' ,'000000000095'
,'000000000096' ,'000000000097' ,'000000000098' ,'000000000099' ,'000000000100'
)
);
First thing is you should probably talk to the owner of the Oracle database and see if they can grant you permission to upload the list into a temporary table in Oracle. That will go much faster.
Second you have assumed that you need to make a single string variable. Instead what you are asking is to create a PROGRAM. The easiest place to store a PROGRAM is in a FILE.
Let's assume you have a dataset named HAVE with the set of unique values of the variable ID that you want to use. You can then use that to write a program that has a series of CREATE TABLE statements. Once you have the program you then use %INCLUDE to run it.
So something like this:
%let limit=10000;
filename code temp;
data _null_;
set have end=eof ;
file code lrecl=80 ;
length qid $20 ;
qid=quote(trim(id),"'");
if mod(_n_,&limit)=1 then do;
fileno+1;
put 'create table WANT' fileno 'as '
/ 'select * from connection to con1 ('
/ 'select * from oracle_table where ID in '
/ '(' @
;
end;
else put ',' @;
put qid @;
if eof or mod(_n_,&limit)=0 then do;
put ')' / ');' ;
end;
if eof then call symputx('nfiles',fileno);
run;
Which you could run using code like this:
proc sql;
connect to oracle as con1 (connection details here);
%include code / source2;
quit;
And you could then put the multiple files back together with a data step like this:
data want;
set want1-want&nfiles;
run;
Here is an example of what the generated program would look like for a list of 100 id values split into a limit of 40 per dataset.
create table WANT1 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000001' ,'000000000002' ,'000000000003' ,'000000000004' ,'000000000005'
,'000000000006' ,'000000000007' ,'000000000008' ,'000000000009' ,'000000000010'
,'000000000011' ,'000000000012' ,'000000000013' ,'000000000014' ,'000000000015'
,'000000000016' ,'000000000017' ,'000000000018' ,'000000000019' ,'000000000020'
,'000000000021' ,'000000000022' ,'000000000023' ,'000000000024' ,'000000000025'
,'000000000026' ,'000000000027' ,'000000000028' ,'000000000029' ,'000000000030'
,'000000000031' ,'000000000032' ,'000000000033' ,'000000000034' ,'000000000035'
,'000000000036' ,'000000000037' ,'000000000038' ,'000000000039' ,'000000000040'
)
);
create table WANT2 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000041' ,'000000000042' ,'000000000043' ,'000000000044' ,'000000000045'
,'000000000046' ,'000000000047' ,'000000000048' ,'000000000049' ,'000000000050'
,'000000000051' ,'000000000052' ,'000000000053' ,'000000000054' ,'000000000055'
,'000000000056' ,'000000000057' ,'000000000058' ,'000000000059' ,'000000000060'
,'000000000061' ,'000000000062' ,'000000000063' ,'000000000064' ,'000000000065'
,'000000000066' ,'000000000067' ,'000000000068' ,'000000000069' ,'000000000070'
,'000000000071' ,'000000000072' ,'000000000073' ,'000000000074' ,'000000000075'
,'000000000076' ,'000000000077' ,'000000000078' ,'000000000079' ,'000000000080'
)
);
create table WANT3 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000081' ,'000000000082' ,'000000000083' ,'000000000084' ,'000000000085'
,'000000000086' ,'000000000087' ,'000000000088' ,'000000000089' ,'000000000090'
,'000000000091' ,'000000000092' ,'000000000093' ,'000000000094' ,'000000000095'
,'000000000096' ,'000000000097' ,'000000000098' ,'000000000099' ,'000000000100'
)
);
SAS code is MUCH easier to debug than MACRO code.
But it is not hard to make a macro variable such as you describe. However there is no need to first make some type of character variable first.
proc sql noprint;
select quote(trim(id),"'") into :mvar separated by ','
from have
;
quit;
The only trouble would be breaking the dataset into small enough sets that they list can fit into a single macro variable (64K bytes) or not exceed the limit that Oracle might have on how many constants you can have in a list for the IN operator or any overall limit on the size of a single query.
If you really want to complicate things by introducing macro logic you might be able to use macro code to generate OBS= and FIRSTOBS= dataset options to apply to the HAVE dataset in the query.
So assuming the limit is 10,000 ids and there are between 30,001 and 39,999 ids you might want something like:
%let limit=10000;
%let nfiles=3;
%do i=1 %to &nfiles;
proc sql noprint;
select quote(trim(id),"'") into :mvar separated by ','
from have(firstobs=%eval((&i-1)*&limit-1) obs=%eval(&i*&limit))
;
connect to oracle ..... ;
create table next as select * from connection to oracle (select .... where id in (&mvar));
quit;
proc append base=want data=next force;
run;
%end;
Which because of the use of the iterative %DO loop will require that you to define and macro and then call the macro.
Could I do something like this pseudocode?
Loop
build a string containing 10000 IDs from my SAS table
Loop
Use the string from above in the INLIST of an Oracle query. Insert results in to a SAS table
end loop
End Loop
@SAS_user_1 wrote:
Could I do something like this pseudocode?
Loop
build a string containing 10000 IDs from my SAS table
Loop
Use the string from above in the INLIST of an Oracle query. Insert results in to a SAS table
end loop
End Loop
@SAS_user_1 That's exactly what the sample code I've shared does - except not limited to a given number of IDs but as many as fit into 31KB.
@SAS_user_1 wrote:
Could I do something like this pseudocode?
Loop
build a string containing 10000 IDs from my SAS table
Loop
Use the string from above in the INLIST of an Oracle query. Insert results in to a SAS table
end loop
End Loop
You already have three examples that do just that.
But I recommend you use my first example. It is must easier to just let SAS do the looping for you. A simple data step is essentially a loop over all of the observations being read in.
@Tom When I run your code, it completes successfully to create the SQL, but all of the IDs are just 11 spaces and a period inside each set of single quotes.
The source table is listed and the log reads in the exact correct number of observations. Please see below for a slightly edited for privacy LOG:
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27
28 %let limit=1000;
29
30 filename code temp;
31 data _null_;
32 set recips_20240317_a end=eof ;
33 file code lrecl=80 ;
34 length qid $20 ;
35 qid=quote(trim(id),"'");
36 if mod(_n_,&limit)=1 then do;
37 fileno+1;
38 put 'create table WANT' fileno 'as '
39 / 'select * from connection to con1 ('
40 / 'select * from schema.oracle_table where ID in '
41 / '(' @
42 ;
43 end;
44 else put ',' @;
45 put qid @;
46 if eof or mod(_n_,&limit)=0 then do;
47 put ')' / ');' ;
48 end;
49 if eof then call symputx('nfiles',fileno);
50 run;
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
35:18
NOTE: Variable id is uninitialized.
NOTE: The file CODE is:
2 The SAS System 15:35 Friday, March 15, 2024
Access Permission=-rw-rw----,
Last Modified=18Mar2024:10:57:59
NOTE: 42616 records were written to the file CODE.
The minimum record length was 1.
The maximum record length was 79.
NOTE: There were 207881 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 0.12 seconds
user cpu time 0.03 seconds
system cpu time 0.00 seconds
memory 600.75k
OS Memory 33436.00k
Timestamp 03/18/2024 10:57:59 AM
Step Count 27 Switch Count 0
Page Faults 0
Page Reclaims 13
Page Swaps 0
Voluntary Context Switches 7
Involuntary Context Switches 15
Block Input Operations 0
Block Output Operations 0
...
55 %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file ....
56 +create table WANT1 as
57 +select * from connection to con1 (
58 +select * from abms.int_elig_update_audit where billing_num in
59 +(' .' ,' .' ,' .' ,' .' ,' .'
60 +,' .' ,' .' ,' .' ,' .' ,' .'
61 +,' .' ,' .' ,' .' ,' .' ,' .'
62 +,' .' ,' .' ,' .' ,' .' ,' .'
63 +,' .' ,' .' ,' .' ,' .' ,' .'
64 +,' .' ,' .' ,' .' ,' .' ,' .'
65 +,' .' ,' .' ,' .' ,' .' ,' .'
66 +,' .' ,' .' ,' .' ,' .' ,' .'
67 +,' .' ,' .' ,' .' ,' .' ,' .'
68 +,' .' ,' .' ,' .' ,' .' ,' .'
69 +,' .' ,' .' ,' .' ,' .' ,' .'
70 +,' .' ,' .' ,' .' ,' .' ,' .'
71 +,' .' ,' .' ,' .' ,' .' ,' .'
72 +,' .' ,' .' ,' .' ,' .' ,' .'
Did you see the messages in the LOG?
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 35:18 NOTE: Variable id is uninitialized.
So either recips_20240317_a is not the right dataset to use or ID is not the right variable name to use.
This is a questions that comes up regularly like lately here.
Oracle accepts SQL syntax of up to 32KB per query. I suggest you send as many keys as possible per query because less queries with more keys will perform better.
Below some sample code that demonstrates how you could dynamically generate and execute such queries and then collect the result.
The sample code is self-contained and only writes to work so you can run it yourself to see in the SAS log what happens. The generated SQL here will also work with implicit pass-through.
data ora_big;
val='1';
do i=1 to 1000000;
key=put(i,z10.);
do j=1 to 100;
output;
end;
end;
run;
data sas_small;
do i=1 to 1000000 by 100;
key=put(i,z10.);
output;
end;
drop i;
run;
%macro sql_pump(keylist);
proc sql;
create table work.result_set as
select key,val
from ora_big
where key in (%unquote(&keylist))
;
quit;
proc datasets lib=work nolist;
append data=work.result_set base=work.want;
run;
delete result_set;
run;
quit;
%mend;
data _null_;
set sas_small end=last;
by key;
if last.key;
length keylist $32000;
retain keylist;
keylist=catx(',',keylist,cats("'",key,"'"));
if _n_=1 then
do;
call execute('proc datasets lib=work nolist nowarn; delete want; quit;');
end;
if last or length(keylist)>31000 then
do;
call execute( cats('%sql_pump(%nrstr(',keylist,'));') );
call missing(keylist);
output;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.