BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SAS_user_1
Calcite | Level 5

I have a list of IDs in a SAS dataset, which I uploaded from Excel.  The IDs are alphanumeric, characters, such as 012345ABCDEF.

 

I have 200000 observations in this IDs SAS dataset.  I need to accomplish 2 things:

 

1.  Take 10000 of the IDs at a time, surround each ID with single quotes, put a comma between them, and put them into a single text string variable.   So they would look like this:

 

'012345ABCDEF', '024680DEFGHI'

 

2.  I'll then take that variable and use it in my passthru query IN statement, such as:

 

proc sql;

CONNECT TO ORACLE as con1

 (connection details here);

create table WANT as

select * from connection to con1 

(

select * from oracle_table where ID in (string variable I created above)

 

);

 

I'll repeat steps 1 and 2 above until I iterate through all 200K of my SAS table IDs.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

First thing is you should probably talk to the owner of the Oracle database and see if they can grant you permission to upload the list into a temporary table in Oracle.  That will go much faster.

 

Second you have assumed that you need to make a single string variable.  Instead what you are asking is to create a PROGRAM.  The easiest place to store a PROGRAM is in a FILE.

 

Let's assume you have a dataset named HAVE with the set of unique values of the variable ID that you want to use.  You can then use that to write a program that has a series of CREATE TABLE statements.  Once you have the program you then use %INCLUDE to run it.

 

So something like this:

%let limit=10000;

filename code temp;
data _null_;
  set have end=eof ;
  file code lrecl=80 ;
  length qid $20 ;
  qid=quote(trim(id),"'");
  if mod(_n_,&limit)=1 then do;
    fileno+1;
    put 'create table WANT' fileno 'as '
      / 'select * from connection to con1 ('
      / 'select * from oracle_table where ID in '
      / '(' @
    ;
  end;
  else put ',' @; 
  put qid @;
  if eof or mod(_n_,&limit)=0 then do;
    put ')' / ');' ;
  end;
  if eof then call symputx('nfiles',fileno);
run;

Which you could run using code like this:

proc sql;
connect to oracle as con1 (connection details here);
%include code / source2;
quit;

And you could then put the multiple files back together with a data step like this:

data want;
  set want1-want&nfiles;
run;

Here is an example of what the generated program would look like for a list of 100 id values split into a limit of 40 per dataset.

create table WANT1 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000001' ,'000000000002' ,'000000000003' ,'000000000004' ,'000000000005'
,'000000000006' ,'000000000007' ,'000000000008' ,'000000000009' ,'000000000010'
,'000000000011' ,'000000000012' ,'000000000013' ,'000000000014' ,'000000000015'
,'000000000016' ,'000000000017' ,'000000000018' ,'000000000019' ,'000000000020'
,'000000000021' ,'000000000022' ,'000000000023' ,'000000000024' ,'000000000025'
,'000000000026' ,'000000000027' ,'000000000028' ,'000000000029' ,'000000000030'
,'000000000031' ,'000000000032' ,'000000000033' ,'000000000034' ,'000000000035'
,'000000000036' ,'000000000037' ,'000000000038' ,'000000000039' ,'000000000040'
)
);
create table WANT2 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000041' ,'000000000042' ,'000000000043' ,'000000000044' ,'000000000045'
,'000000000046' ,'000000000047' ,'000000000048' ,'000000000049' ,'000000000050'
,'000000000051' ,'000000000052' ,'000000000053' ,'000000000054' ,'000000000055'
,'000000000056' ,'000000000057' ,'000000000058' ,'000000000059' ,'000000000060'
,'000000000061' ,'000000000062' ,'000000000063' ,'000000000064' ,'000000000065'
,'000000000066' ,'000000000067' ,'000000000068' ,'000000000069' ,'000000000070'
,'000000000071' ,'000000000072' ,'000000000073' ,'000000000074' ,'000000000075'
,'000000000076' ,'000000000077' ,'000000000078' ,'000000000079' ,'000000000080'
)
);
create table WANT3 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000081' ,'000000000082' ,'000000000083' ,'000000000084' ,'000000000085'
,'000000000086' ,'000000000087' ,'000000000088' ,'000000000089' ,'000000000090'
,'000000000091' ,'000000000092' ,'000000000093' ,'000000000094' ,'000000000095'
,'000000000096' ,'000000000097' ,'000000000098' ,'000000000099' ,'000000000100'
)
);

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

First thing is you should probably talk to the owner of the Oracle database and see if they can grant you permission to upload the list into a temporary table in Oracle.  That will go much faster.

 

Second you have assumed that you need to make a single string variable.  Instead what you are asking is to create a PROGRAM.  The easiest place to store a PROGRAM is in a FILE.

 

Let's assume you have a dataset named HAVE with the set of unique values of the variable ID that you want to use.  You can then use that to write a program that has a series of CREATE TABLE statements.  Once you have the program you then use %INCLUDE to run it.

 

So something like this:

%let limit=10000;

filename code temp;
data _null_;
  set have end=eof ;
  file code lrecl=80 ;
  length qid $20 ;
  qid=quote(trim(id),"'");
  if mod(_n_,&limit)=1 then do;
    fileno+1;
    put 'create table WANT' fileno 'as '
      / 'select * from connection to con1 ('
      / 'select * from oracle_table where ID in '
      / '(' @
    ;
  end;
  else put ',' @; 
  put qid @;
  if eof or mod(_n_,&limit)=0 then do;
    put ')' / ');' ;
  end;
  if eof then call symputx('nfiles',fileno);
run;

Which you could run using code like this:

proc sql;
connect to oracle as con1 (connection details here);
%include code / source2;
quit;

And you could then put the multiple files back together with a data step like this:

data want;
  set want1-want&nfiles;
run;

Here is an example of what the generated program would look like for a list of 100 id values split into a limit of 40 per dataset.

create table WANT1 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000001' ,'000000000002' ,'000000000003' ,'000000000004' ,'000000000005'
,'000000000006' ,'000000000007' ,'000000000008' ,'000000000009' ,'000000000010'
,'000000000011' ,'000000000012' ,'000000000013' ,'000000000014' ,'000000000015'
,'000000000016' ,'000000000017' ,'000000000018' ,'000000000019' ,'000000000020'
,'000000000021' ,'000000000022' ,'000000000023' ,'000000000024' ,'000000000025'
,'000000000026' ,'000000000027' ,'000000000028' ,'000000000029' ,'000000000030'
,'000000000031' ,'000000000032' ,'000000000033' ,'000000000034' ,'000000000035'
,'000000000036' ,'000000000037' ,'000000000038' ,'000000000039' ,'000000000040'
)
);
create table WANT2 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000041' ,'000000000042' ,'000000000043' ,'000000000044' ,'000000000045'
,'000000000046' ,'000000000047' ,'000000000048' ,'000000000049' ,'000000000050'
,'000000000051' ,'000000000052' ,'000000000053' ,'000000000054' ,'000000000055'
,'000000000056' ,'000000000057' ,'000000000058' ,'000000000059' ,'000000000060'
,'000000000061' ,'000000000062' ,'000000000063' ,'000000000064' ,'000000000065'
,'000000000066' ,'000000000067' ,'000000000068' ,'000000000069' ,'000000000070'
,'000000000071' ,'000000000072' ,'000000000073' ,'000000000074' ,'000000000075'
,'000000000076' ,'000000000077' ,'000000000078' ,'000000000079' ,'000000000080'
)
);
create table WANT3 as
select * from connection to con1 (
select * from oracle_table where ID in
('000000000081' ,'000000000082' ,'000000000083' ,'000000000084' ,'000000000085'
,'000000000086' ,'000000000087' ,'000000000088' ,'000000000089' ,'000000000090'
,'000000000091' ,'000000000092' ,'000000000093' ,'000000000094' ,'000000000095'
,'000000000096' ,'000000000097' ,'000000000098' ,'000000000099' ,'000000000100'
)
);
SAS_user_1
Calcite | Level 5
1. Can't create table in Oracle, and I agree that would be the best route.

2. I will try what you have posted here; thank you!

3. I do think that what I said, would work: I just need a string variable, which has multiple IDs surrounded by embedded single quotes and separated by commas. I think this can be built in a macro, I just haven't figured it out yet.
Tom
Super User Tom
Super User

SAS code is MUCH easier to debug than MACRO code.

 

But it is not hard to make a macro variable such as you describe.  However there is no need to first make some type of character variable first.

proc sql noprint;
select quote(trim(id),"'") into :mvar separated by ','
from have
;
quit;

The only trouble would be breaking the dataset into small enough sets that they list can fit into a single macro variable (64K bytes) or not exceed the limit that Oracle might have on how many constants you can have in a list for the IN operator or any overall limit on the size of a single query.

 

If you really want to complicate things by introducing macro logic you might be able to use macro code to generate OBS= and FIRSTOBS= dataset options to apply to the HAVE dataset in the query.  

So assuming the limit is 10,000 ids and there are between 30,001 and 39,999 ids you might want something like:

%let limit=10000;
%let nfiles=3;
%do i=1 %to &nfiles;
proc sql noprint;
select quote(trim(id),"'") into :mvar separated by ','
from have(firstobs=%eval((&i-1)*&limit-1) obs=%eval(&i*&limit))
;
connect to oracle ..... ;
create table next as select * from connection to oracle (select .... where id in (&mvar));
quit;
proc append base=want data=next force;
run;
%end;

Which because of the use of the iterative %DO loop will require that you to define and macro and then call the macro.

 

 

SAS_user_1
Calcite | Level 5

Could I do something like this pseudocode?

Loop
   build a string containing 10000 IDs from my SAS table
   Loop
      Use the string from above in the INLIST of an Oracle query.  Insert results in to a SAS table
   end loop
End Loop

Patrick
Opal | Level 21

@SAS_user_1 wrote:

Could I do something like this pseudocode?

Loop
   build a string containing 10000 IDs from my SAS table
   Loop
      Use the string from above in the INLIST of an Oracle query.  Insert results in to a SAS table
   end loop
End Loop


@SAS_user_1  That's exactly what the sample code I've shared does - except not limited to a given number of IDs but as many as fit into 31KB.

Tom
Super User Tom
Super User

@SAS_user_1 wrote:

Could I do something like this pseudocode?

Loop
   build a string containing 10000 IDs from my SAS table
   Loop
      Use the string from above in the INLIST of an Oracle query.  Insert results in to a SAS table
   end loop
End Loop


You already have three examples that do just that.

 

But I recommend you use my first example.  It is must easier to just let SAS do the looping for you.  A simple data step is essentially a loop over all of the observations being read in.

SAS_user_1
Calcite | Level 5

@Tom When I run your code, it completes successfully to create the SQL, but all of the IDs are just 11 spaces and a period inside each set of single quotes.

The source table is listed and the log reads in the exact correct number of observations.  Please see below for a slightly edited for privacy LOG:

NOTE: Writing HTML5(EGHTML) Body file: EGHTML

27        

28         %let limit=1000;

29        

30         filename code temp;

 

 

31         data _null_;

32           set recips_20240317_a end=eof ;

33           file code lrecl=80 ;

34           length qid $20 ;

35           qid=quote(trim(id),"'");

36           if mod(_n_,&limit)=1 then do;

37             fileno+1;

38             put 'create table WANT' fileno 'as '

39               / 'select * from connection to con1 ('

40               / 'select * from schema.oracle_table where ID in '

41               / '(' @

42             ;

43           end;

44           else put ',' @;

45           put qid @;

46           if eof or mod(_n_,&limit)=0 then do;

47             put ')' / ');' ;

48           end;

49           if eof then call symputx('nfiles',fileno);

50         run;

 

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).

      35:18  

NOTE: Variable id is uninitialized.

NOTE: The file CODE is:

2                                                          The SAS System                               15:35 Friday, March 15, 2024

 

      Access Permission=-rw-rw----,

      Last Modified=18Mar2024:10:57:59

 

NOTE: 42616 records were written to the file CODE.

      The minimum record length was 1.

      The maximum record length was 79.

NOTE: There were 207881 observations read from the data set WORK.HAVE.

NOTE: DATA statement used (Total process time):

      real time           0.12 seconds

      user cpu time       0.03 seconds

      system cpu time     0.00 seconds

      memory              600.75k

      OS Memory           33436.00k

      Timestamp           03/18/2024 10:57:59 AM

      Step Count                        27  Switch Count  0

      Page Faults                       0

      Page Reclaims                     13

      Page Swaps                        0

      Voluntary Context Switches        7

      Involuntary Context Switches      15

      Block Input Operations            0

      Block Output Operations           0

     

                

                ...

               

                55         %include code / source2;

NOTE: %INCLUDE (level 1) file CODE is file ....

56        +create table WANT1 as

57        +select * from connection to con1 (

58        +select * from abms.int_elig_update_audit where billing_num in

59        +('           .' ,'           .' ,'           .' ,'           .' ,'           .'

60        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

61        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

62        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

63        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

64        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

65        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

66        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

67        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

68        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

69        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

70        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

71        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

72        +,'           .' ,'           .' ,'           .' ,'           .' ,'           .'

Tom
Super User Tom
Super User

Did you see the messages in the LOG?

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).

      35:18  
NOTE: Variable id is uninitialized.

So either recips_20240317_a  is not the right dataset to use or ID is not the right variable name to use.

SAS_user_1
Calcite | Level 5
Perfect. Running now. Gonna take a minute because it's a lot of data. But no errors.
Patrick
Opal | Level 21

This is a questions that comes up regularly like lately here

Oracle accepts SQL syntax of up to 32KB per query. I suggest you send as many keys as possible per query because less queries with more keys will perform better. 

 

Below some sample code that demonstrates how you could dynamically generate and execute such queries and then collect the result.

The sample code is self-contained and only writes to work so you can run it yourself to see in the SAS log what happens. The generated SQL here will also work with implicit pass-through.

data ora_big;
  val='1';
  do i=1 to 1000000;
    key=put(i,z10.);
    do j=1 to 100;
      output;
    end;
  end;
run;

data sas_small;
  do i=1 to 1000000 by 100;
    key=put(i,z10.);
    output;
  end;
  drop i;
run;

%macro sql_pump(keylist);
  proc sql;
    create table work.result_set as
    select key,val
    from ora_big
    where key in (%unquote(&keylist))
    ;
  quit;
  proc datasets lib=work nolist;
    append data=work.result_set base=work.want;
    run;
    delete result_set;
    run;
  quit;
%mend;

data _null_;
  set sas_small end=last;
  by key;
  if last.key;

  length keylist $32000;
  retain keylist;
  keylist=catx(',',keylist,cats("'",key,"'"));

  if _n_=1 then
    do;
      call execute('proc datasets lib=work nolist nowarn; delete want; quit;');
    end;
  if last or length(keylist)>31000 then
    do;
      call execute( cats('%sql_pump(%nrstr(',keylist,'));') );
      call missing(keylist);
      output;
    end;
run;

Patrick_0-1710718087069.png

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1141 views
  • 0 likes
  • 3 in conversation