BookmarkSubscribeRSS Feed
ScottBass
Rhodochrosite | Level 12

I have two SQL Server tables with 100M+ rows.  I need to do a PROC COMPARE on the two tables.  

 

The most performant approach has been to use explicit pass through views, then PROC COMPARE.

 

Partial code:

 

proc sql;
   connect using tmp;
   create view old_mapped as
   select *
   from connection to tmp (
SELECT  *
  FROM old.RLDX_DEMOGRAPHIC
 WHERE hospital_type in (1,2,4)
ORDER BY
       facility_identifier
      ,stay_number
      ,episode_sequence_number
      ,rldx_audit_key
   );
quit;

 

I get much better performance when SQL Server does the filtering and sorting for PROC COMPARE than other approaches.

 

This isn't all the code, and this program also creates some SAS datasets downstream of the view creation.

 

Then I "accidentally" clicked the Output Data tab.  I say "accidentally" because I kind of knew what would happen.  Since the above view was first in the list...well, 30 minutes later EG returns control to me, and I can then select the SAS dataset that I really want to look at.  During that time I couldn't even click the Save icon to save the project.

 

I've fiddled with the EG options for Results and Data -> General but no joy.

 

Any ideas I'm not thinking of to make EG less sucky in this scenario?  Does EG 8.1 still have this annoying behaviour?  Perhaps a timeout on the Output window display would be a useful option?

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
7 REPLIES 7
SASKiwi
PROC Star

Try the options under Data - Performance.

 

You can set a limit of the number of rows SAS/ACCESS table returns in an EG grid (I'm using 8.2) and the default value is 10,000 rows.

 

I'd also recomment setting the option - When output tables are generated, automatically open - no tables. 

ScottBass
Rhodochrosite | Level 12

@SASKiwi wrote:

Try the options under Data - Performance.

 

You can set a limit of the number of rows SAS/ACCESS table returns in an EG grid (I'm using 8.2) and the default value is 10,000 rows.

 

I'd also recomment setting the option - When output tables are generated, automatically open - no tables. 


I already have the 10,000 row setting enabled.  I don't see the option When output tables are generated, automatically open - no tables in my version of EG.

 

The problem is the ORDER BY in the explicit passthrough on the 100M row table takes a while before it even starts returning rows.  I'd be happy if EG had an interrupt option (I dunno say Cntl-C) on the data viewer.  Or allowed me to click the Output Data tab, then select the desired dataset, without first displaying the first entry.  Or something that would allow me to regain control of EG.

 

SQL Server Management Studio -> long running query.  Click Stop Execution.  Boom, query stopped (ok if it's an update query you can get a long wait during rollback).  But why can't EG R&D take some hints from other IDE's?

 

Here is some code that sort of illustrates the issue.

 

In EG, create Program Entry #1:

 

* libname workspde spde "%sysfunc(pathname(work))" temp=yes;

data 
   workspde.a
   workspde.b
   workspde.c
   ;
   x=1;
   y=1;
run;

 

Just execute the libname once.

 

Program Entry #2:

 

%macro makeview(x);
data v_&x / view=v_&x;
   set workspde.&x;
   by y;
run;
%mend;
%makeview(a);
%makeview(b);
%makeview(c);

data 
   workspde.a
   workspde.b
   workspde.c
   ;
   do x=1 to 1E9;
      y=int(ranuni(0)*1E12);
      output workspde.a;
      y=int(ranuni(0)*1E12);
      output workspde.b;
      y=int(ranuni(0)*1E12);
      output workspde.c;
   end;
run;

 

This now has V_A as the first entry in the list.  Click the Output Data tab.  Since V_A is first in the list, you'll wait for the SPDE engine to sort the data.  Once it finally finishes, you can select dataset workspde.b, the one you really wanted to look at.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
SASKiwi
PROC Star

@ScottBass  - In your EG version the option to limit output datasets is probably this:

screenshot20.jpg

Kurt_Bremser
Super User

Since you have a pass-through view, the view has to be executed in its entirety by the SQL server before SAS can do any subsetting.

What IS annoying is that EG does not move this wait into a background thread and gives you back control to do other things (the workspace server will be blocked anyway, so any SAS code execution will have to wait).

ScottBass
Rhodochrosite | Level 12

@Kurt_Bremser wrote:

Since you have a pass-through view, the view has to be executed in its entirety by the SQL server before SAS can do any subsetting.

What IS annoying is that EG does not move this wait into a background thread and gives you back control to do other things (the workspace server will be blocked anyway, so any SAS code execution will have to wait).


Why?  It's software.  Why can't it be changed and improved?  Why can I have a long running query in SQL Server Management Studio, click Stop Query, and every single time without fail it immediately stops (well except for update queries but that's irrelevant to this question).  If the limitation is in the ODBC driver then find some other solution:  click Output Data tab to display the dropdown of data sets/views, then click the desired data set; have an option to timeout the data viewer; have a keystroke to interrupt the data viewer; ask Microsoft how they manage to do it.  Anything but the current behaviour.  Do some usability studies and come up with a better solution.

 

Do you really think EG "hanging" for 30 minutes is acceptable behaviour?

 

And the same thing applies to when I want to stop a process that is running against an RDBMS (and sometimes with the SPDE engine).  My "stop now" may take several hours.


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
TomKari
Onyx | Level 15

I agree with @Kurt_Bremser. I used to work with a database with very large tables. If I double clicked a table, it opened immediately. If I clicked a view, as Kurt says it had to instantiate the entire view before it returned anything (which sounds weird but makes complete sense, technically). Test with small volumes, and only use full volumes when you're sure it works (and then go for lunch!).

 

Tom

Kurt_Bremser
Super User

You have an ORDER BY in the view. This forces the DBMS to sort the whole table before it can return anything. You might be able to speed it up if a fitting index was defined on the table in the DBMS.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1173 views
  • 0 likes
  • 4 in conversation