BookmarkSubscribeRSS Feed
Ronein
Onyx | Level 15

Hello

I am running simple proc sort on a big data set that is located in a permanent library.

It took more than 9 hours to complete this simple query so something is wrong here...

 Please note that the WHERE statement is working on sas date variable called Update_date.

My questions:

Why did it happen?

What is the solution for it?

%let mon=2212;

%let month = %substr(&mon,3,2);
%put &month;
%let year=20%substr(&mon,1,2);
%put &year;

proc sort data=ENGINALB.ACCOUNT_MONTHLY_DATA(Where=(UPDATE_DATE ne '16MAY2022'd
AND year(UPDATE_DATE)=&year.
AND month(UPDATE_DATE)=&month.)) out=ACCOUNT_MONTHLY_DATA_RF_CS_12; by UPDATE_DATE FK_APPLICATION FK_MONTHLY_DATA_ACCOUNT REFERENCE_DATE ; Run;

 

Here is the Log related to this query

NOTE: Sorting was performed by the data source.
NOTE: There were 23734185 observations read from the data set ENGINALB.ACCOUNT_MONTHLY_DATA_RF_CS.
      WHERE (UPDATE_DATE not = '16MAY2022'D) and (YEAR(UPDATE_DATE)=2022) and (MONTH(UPDATE_DATE)=12);
NOTE: The data set WORK.ACCOUNT_MONTHLY_DATA_RF_CS_12 has 23734185 observations and 75 variables.
NOTE: Compressing data set WORK.ACCOUNT_MONTHLY_DATA_RF_CS_12 decreased size by 96.37 percent. 
      Compressed is 123105 pages; un-compressed would require 3390599 pages.
NOTE: PROCEDURE SORT used (Total process time):
      real time           9:56.82
      user cpu time       5:37.08
      system cpu time     11.28 seconds
      memory              1599.25k
      OS Memory           22176.00k
      Timestamp           12/25/2022 12:07:04 PM
      Step Count                        13  Switch Count  1
      Page Faults                       0
      Page Reclaims                     6542
      Page Swaps                        0
      Voluntary Context Switches        53563
      Involuntary Context Switches      857
      Block Input Operations            0
      Block Output Operations           0
      


NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      51215 at 26:24   679 at 26:76     679 at 26:184    679 at 26:201    
NOTE: There were 23734185 observations read from the data set WORK.ACCOUNT_MONTHLY_DATA_RF_CS_12.

 

 

When I run it in another way then it took very short time

Data ACCOUNT_MONTHLY_DATA_RF_CS_12(Where=(UPDATE_DATE  ne '16MAY2022'd ));
SET ENGINALB.ACCOUNT_MONTHLY_DATA_RF_CS(Where=(
year(UPDATE_DATE)=&year.
AND month(UPDATE_DATE)=&month.));
Run;

proc sort data=ACCOUNT_MONTHLY_DATA_RF_CS_12;
by   UPDATE_DATE    FK_APPLICATION       FK_MONTHLY_DATA_ACCOUNT    REFERENCE_DATE  ;
Run;

Why this query is much quicker than the first one? 

12 REPLIES 12
Ronein
Onyx | Level 15

Here is the Log after run : libname enginalb list;

1                                                          The SAS System                            20:18 Sunday, December 25, 2022

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET _CLIENTTASKLABEL='A';
4          %LET _CLIENTPROCESSFLOWNAME='Process Flow';
5          %LET _CLIENTPROJECTPATH='';
6          %LET _CLIENTPROJECTPATHHOST='';
7          %LET _CLIENTPROJECTNAME='';
8          %LET _SASPROGRAMFILE='K:\RRR\A.sas';
9          %LET _SASPROGRAMFILEHOST='VSK2H010A3003';
10         
11         ODS _ALL_ CLOSE;
12         OPTIONS DEV=PNG;
13         GOPTIONS XPIXELS=0 YPIXELS=0;
14         FILENAME EGSR TEMP;
15         ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
16             STYLE=HTMLBlue
17             STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css")
18             NOGTITLE
19             NOGFOOTNOTE
20             GPATH=&sasworklocation
21             ENCODING=UTF8
22             options(rolap="on")
23         ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24         
25         GOPTIONS ACCESSIBLE;
26         libname enginalb list;
NOTE: Libref=   ENGINALB 
      Scope=    Object Server
      Engine=   SQLSVR
      Physical Name= EnginalBatch
      Schema/Owner= dbo
27         
28         
29         GOPTIONS NOACCESSIBLE;
30         %LET _CLIENTTASKLABEL=;
31         %LET _CLIENTPROCESSFLOWNAME=;
32         %LET _CLIENTPROJECTPATH=;
33         %LET _CLIENTPROJECTPATHHOST=;
34         %LET _CLIENTPROJECTNAME=;
35         %LET _SASPROGRAMFILE=;
36         %LET _SASPROGRAMFILEHOST=;
37         
38         ;*';*";*/;quit;run;
39         ODS _ALL_ CLOSE;
40         
41         
42         QUIT; RUN;
43         
SASKiwi
PROC Star

Please post the log of your second faster run. Comparing the user CPU time would be useful.;

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Ronein 

The initial Note "Sorting was performed by the data source" is probably the clue here. You do not specify the data location, but it might be a database system residing on a server without sufficient memory to perform the sort. This would explain the petter performance when the sorting is done in SAS.

Ronein
Onyx | Level 15

Thanks,

I want to understand please about way1:

Is the sort done on the permanent data set (libname ENGINALB)  and only then it is saved in work library?

I was sure that first create data set in work library and only then sort the data.

I also think that it is not allowed to perform sort on permanent data set (by admin restrictions).

%let mon=2212;
%let month = %substr(&mon,3,2);
%put &month;
%let year=20%substr(&mon,1,2);
%put &year;

/***Way1-Very long run time***/
/***Way1-Very long run time***/
/***Way1-Very long run time***/
proc sort data=ENGINALB.ACCOUNT_MONTHLY_DATA(Where=(UPDATE_DATE  ne '16MAY2022'd AND year(UPDATE_DATE)=&year. AND month(UPDATE_DATE)=&month.))
out=ACCOUNT_MONTHLY_DATA_RF_CS_12;
by   UPDATE_DATE    FK_APPLICATION       FK_MONTHLY_DATA_ACCOUNT    REFERENCE_DATE  ;
Run;

/***Way2-quick run time***/
/***Way2-quick run time***/
/***Way2-quick run time***/
Data ACCOUNT_MONTHLY_DATA_RF_CS_12(Where=(UPDATE_DATE  ne '16MAY2022'd ));
SET ENGINALB.ACCOUNT_MONTHLY_DATA_RF_CS(Where=(
year(UPDATE_DATE)=&year.
AND month(UPDATE_DATE)=&month.));
Run;

proc sort data=ACCOUNT_MONTHLY_DATA_RF_CS_12;
by   UPDATE_DATE    FK_APPLICATION       FK_MONTHLY_DATA_ACCOUNT    REFERENCE_DATE  ;
Run;
Ronein
Onyx | Level 15
As I understand the permanent data set is located in SQL SERVER data base.
Is it the reason for the long run of proc sort?
If the permanent data set was located in SAS server, then it wouldn't take long run?

What do you mean when you say that contents are sorted by sql server but table is not sorted by sql server?
(What does it mean contents?)As I know sort is done on the data ...
Kurt_Bremser
Super User

The table in the database will stay as it is, but the data contained in it will be sorted (in-memory or in temporary space when out of memory) by the database server before being sent to SAS.

Explicitly sorting in SAS avoids this, and the DB needs only do a single sequential read while applying the WHERE.

Ronein
Onyx | Level 15

Is the sort done on sql server ? Why?

SASKiwi
PROC Star

By default SAS will pass processing back to the external database you are reading as it is usually more efficient to do it that way.

SASKiwi
PROC Star

I'd also suggest trying a different WHERE clause to see if that makes any difference:

where UPDATE_DATE between '01Dec2022'd and '31Dec2022'd

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2384 views
  • 4 likes
  • 4 in conversation