Hello
I am running simple proc sort on a big data set that is located in a permanent library.
It took more than 9 hours to complete this simple query so something is wrong here...
Please note that the WHERE statement is working on sas date variable called Update_date.
My questions:
Why did it happen?
What is the solution for it?
%let mon=2212;
%let month = %substr(&mon,3,2);
%put &month;
%let year=20%substr(&mon,1,2);
%put &year;
proc sort data=ENGINALB.ACCOUNT_MONTHLY_DATA(Where=(UPDATE_DATE ne '16MAY2022'd
AND year(UPDATE_DATE)=&year.
AND month(UPDATE_DATE)=&month.))
out=ACCOUNT_MONTHLY_DATA_RF_CS_12;
by UPDATE_DATE FK_APPLICATION FK_MONTHLY_DATA_ACCOUNT REFERENCE_DATE ;
Run;
Here is the Log related to this query
NOTE: Sorting was performed by the data source. NOTE: There were 23734185 observations read from the data set ENGINALB.ACCOUNT_MONTHLY_DATA_RF_CS. WHERE (UPDATE_DATE not = '16MAY2022'D) and (YEAR(UPDATE_DATE)=2022) and (MONTH(UPDATE_DATE)=12); NOTE: The data set WORK.ACCOUNT_MONTHLY_DATA_RF_CS_12 has 23734185 observations and 75 variables. NOTE: Compressing data set WORK.ACCOUNT_MONTHLY_DATA_RF_CS_12 decreased size by 96.37 percent. Compressed is 123105 pages; un-compressed would require 3390599 pages. NOTE: PROCEDURE SORT used (Total process time): real time 9:56.82 user cpu time 5:37.08 system cpu time 11.28 seconds memory 1599.25k OS Memory 22176.00k Timestamp 12/25/2022 12:07:04 PM Step Count 13 Switch Count 1 Page Faults 0 Page Reclaims 6542 Page Swaps 0 Voluntary Context Switches 53563 Involuntary Context Switches 857 Block Input Operations 0 Block Output Operations 0 NOTE: Missing values were generated as a result of performing an operation on missing values. Each place is given by: (Number of times) at (Line):(Column). 51215 at 26:24 679 at 26:76 679 at 26:184 679 at 26:201 NOTE: There were 23734185 observations read from the data set WORK.ACCOUNT_MONTHLY_DATA_RF_CS_12.
When I run it in another way then it took very short time
Data ACCOUNT_MONTHLY_DATA_RF_CS_12(Where=(UPDATE_DATE ne '16MAY2022'd ));
SET ENGINALB.ACCOUNT_MONTHLY_DATA_RF_CS(Where=(
year(UPDATE_DATE)=&year.
AND month(UPDATE_DATE)=&month.));
Run;
proc sort data=ACCOUNT_MONTHLY_DATA_RF_CS_12;
by UPDATE_DATE FK_APPLICATION FK_MONTHLY_DATA_ACCOUNT REFERENCE_DATE ;
Run;
Why this query is much quicker than the first one?
Which engine does your library ENGINALB use? Run
libname enginalb list;
and post the log..
Here is the Log after run : libname enginalb list;
1 The SAS System 20:18 Sunday, December 25, 2022 1 ;*';*";*/;quit;run; 2 OPTIONS PAGENO=MIN; 3 %LET _CLIENTTASKLABEL='A'; 4 %LET _CLIENTPROCESSFLOWNAME='Process Flow'; 5 %LET _CLIENTPROJECTPATH=''; 6 %LET _CLIENTPROJECTPATHHOST=''; 7 %LET _CLIENTPROJECTNAME=''; 8 %LET _SASPROGRAMFILE='K:\RRR\A.sas'; 9 %LET _SASPROGRAMFILEHOST='VSK2H010A3003'; 10 11 ODS _ALL_ CLOSE; 12 OPTIONS DEV=PNG; 13 GOPTIONS XPIXELS=0 YPIXELS=0; 14 FILENAME EGSR TEMP; 15 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR 16 STYLE=HTMLBlue 17 STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HTMLBlue.css") 18 NOGTITLE 19 NOGFOOTNOTE 20 GPATH=&sasworklocation 21 ENCODING=UTF8 22 options(rolap="on") 23 ; NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR 24 25 GOPTIONS ACCESSIBLE; 26 libname enginalb list; NOTE: Libref= ENGINALB Scope= Object Server Engine= SQLSVR Physical Name= EnginalBatch Schema/Owner= dbo 27 28 29 GOPTIONS NOACCESSIBLE; 30 %LET _CLIENTTASKLABEL=; 31 %LET _CLIENTPROCESSFLOWNAME=; 32 %LET _CLIENTPROJECTPATH=; 33 %LET _CLIENTPROJECTPATHHOST=; 34 %LET _CLIENTPROJECTNAME=; 35 %LET _SASPROGRAMFILE=; 36 %LET _SASPROGRAMFILEHOST=; 37 38 ;*';*";*/;quit;run; 39 ODS _ALL_ CLOSE; 40 41 42 QUIT; RUN; 43
As @ErikLund_Jensen already suspected, your SQL server seems to be seriously undersized for the data.
Move all processing to SAS.
Please post the log of your second faster run. Comparing the user CPU time would be useful.;
Hi @Ronein
The initial Note "Sorting was performed by the data source" is probably the clue here. You do not specify the data location, but it might be a database system residing on a server without sufficient memory to perform the sort. This would explain the petter performance when the sorting is done in SAS.
Thanks,
I want to understand please about way1:
Is the sort done on the permanent data set (libname ENGINALB) and only then it is saved in work library?
I was sure that first create data set in work library and only then sort the data.
I also think that it is not allowed to perform sort on permanent data set (by admin restrictions).
%let mon=2212;
%let month = %substr(&mon,3,2);
%put &month;
%let year=20%substr(&mon,1,2);
%put &year;
/***Way1-Very long run time***/
/***Way1-Very long run time***/
/***Way1-Very long run time***/
proc sort data=ENGINALB.ACCOUNT_MONTHLY_DATA(Where=(UPDATE_DATE ne '16MAY2022'd AND year(UPDATE_DATE)=&year. AND month(UPDATE_DATE)=&month.))
out=ACCOUNT_MONTHLY_DATA_RF_CS_12;
by UPDATE_DATE FK_APPLICATION FK_MONTHLY_DATA_ACCOUNT REFERENCE_DATE ;
Run;
/***Way2-quick run time***/
/***Way2-quick run time***/
/***Way2-quick run time***/
Data ACCOUNT_MONTHLY_DATA_RF_CS_12(Where=(UPDATE_DATE ne '16MAY2022'd ));
SET ENGINALB.ACCOUNT_MONTHLY_DATA_RF_CS(Where=(
year(UPDATE_DATE)=&year.
AND month(UPDATE_DATE)=&month.));
Run;
proc sort data=ACCOUNT_MONTHLY_DATA_RF_CS_12;
by UPDATE_DATE FK_APPLICATION FK_MONTHLY_DATA_ACCOUNT REFERENCE_DATE ;
Run;
The SQL server table is not sorted, but its contents are sorted by SQL Server before being sent to SAS. This is what takes so long.
The table in the database will stay as it is, but the data contained in it will be sorted (in-memory or in temporary space when out of memory) by the database server before being sent to SAS.
Explicitly sorting in SAS avoids this, and the DB needs only do a single sequential read while applying the WHERE.
Is the sort done on sql server ? Why?
By default SAS will pass processing back to the external database you are reading as it is usually more efficient to do it that way.
I'd also suggest trying a different WHERE clause to see if that makes any difference:
where UPDATE_DATE between '01Dec2022'd and '31Dec2022'd
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.