Hi team,
Currently i am having a code which starts at 02:49:17 and ends at 07:04:57 time.because of this the sas dataset is over running and users are trying to access the datasets from their end which is causing failure and chances datasets will not be updated with correct data.
i can not change the schedule as it is dependent on other datasets also.
so i thought of trying to improve the performance of the code so that it will be running fast.this is just my thought.
i am having a very low knowledge so i am requesting to please provide your thoughts as i believe that will be more excellent and efficinet options please.
iam having so many proc sorts in code- please suggest can i do something on this example : tagsort option or any other advice please
Sorting is one of the most common and time intensive activities.
Are you using indexed data sets? Using indexes instead of sorts can speed some activities. Are you rebuilding the same data every day? Appending to existing data?
Does the data reside in an external DBMS?
What exactly are you doing that requires multiple sorts? Can you provide specific examples of the code using the different sorts?
Sometimes a careful sort order selection can allow reuse of the same data without resorting. And if you are doing multiple summaries of data then perhaps use of class variables in a procedure like Proc Summary could do multiple summaries and select the needed one by using a selection of the created _type_ variable.
Some details may help.
Given what you've posted, my suggestion would be to search "efficient coding strategies" or "make code faster" on lexjansen.com and read through the paper there.
50 Ways to Make Your SAS® Code Execute More Efficiently
Twenty Ways to Run Your SAS® Program Faster and Use Less
Space
https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3684-2019.pdf
Thanks, i will check and try.
May i know is there any other approach you can see and suggest as i am from basic level of knowledge on this issue please
As BallardW indicates you need to provide more details. Vague questions get vague answers.
To make suggestions on possible improvements for your code, we need to see it. Since the log contains performance-relevant information, we need to see it too.
If you can identify single steps that take up most of the time, these need to be worked on first.
Undertaking a large job for production use with large data while having only basic knowledge is, of course, the wrong way to go. You build your skills with small tasks first, then apply the experience you gained to the bigger issues later.
If time is of the essence, I suggest you hire a consultant and then learn from the methods used by said consultant. I had some of my important early "aha" moments like this.
Hi @Kurt_Bremser ,
below code is taking more time(24:39.42) compared with the rest of all code and other sections of code is of seconds only maximum 7 minutes. would you please suggest any option to improve the below code
data 2a (drop=ty amt);
set ay;
by num no no1;
if ty = 'AA' then ccb = amt;
else if ty = 'BB' then cles = amt;
else if ty = 'CCC' then nut = amt;
else if ty = 'DDD' then amp = amt;
else if ty = 'EE' then ss = amt;
else if ty = 'FFF' then rat = amt;
else if ty = 'GGG' then xcv = amt;
else if ty = 'HHH' then wee = amt;
else if ty = 'II' then ler = amt;
else if ty = 'JJJ' then mit = amt;
else if ty = 'KK' and hhhoi in ('SETL' 'SET1' 'SET2' 'SET3') then do;
if gjhrt = '16' then ghfcv = 'SD';
else ghfcv = 'ED';
end;
retain ccb amp cles nut ler mit
ss rat xcv wee 0 ghfcv " ";
if last.no1 then do;
output;
ccb = 0;
cles = 0;
nut = 0;
amp = 0;
ss = 0;
rat = 0;
xcv = 0;
wee = 0;
ler = 0;
mit = 0;
ghfcv = " ";
end;
run;
proc datasets lib=work memtype=data;
delete ay;
run;
quit;
Looks sort of like a transpose. Perhaps you don't need to do the transpose at all?
How do you use the dataset, 2A? Is it used just once or multiple times?
Which value of TY is the most common?
If you have the more common values of the variable TY in that block of If/then/else then fewer ifs have to be evaluated.
For example if the JJJ were the most common value then you have to execute 9 additional statements than if the block started with
If ty = 'JJJ' then ...
I am guessing you tweaked the code somewhat because WORK.2A is not a valid data set name as coded, even with OPTIONS VALIDVARNAME=ANY.
Things to check are excessively long variables such as $4096 when $20 is sufficient.
The output data set uses option (DROP=) and could allow un-needed 'tag-along' variables in your per-group data record, an explicit (KEEP=) could be helpful.
As for
by num no no1;
How many by groups do you have ?
Is data set AY constructed elsewhere with a JOIN ? If so, the real world data might be subverting the expectations of the join criteria.
Do you have any log snippets that showing the steps
- number of observations read and written
- system memory and cpu
Get more info by setting
OPTIONS FULLSTIMER;
Log snippet should look like
NOTE: There were #### observations read from the data set ********.******** NOTE: The data set ********.******** has #### observations and #### variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds user cpu time 0.00 seconds system cpu time 0.01 seconds memory 708.40k OS Memory 24584.00k Timestamp 06/01/2020 06:56:30 PM Step Count 401 Switch Count 0
Can you show the output from
Proc CONTENTS data=<name of SET data set>
Proc CONTENTS data=<name of OUTPUT data set>
After you create dataset 2A (an erroneous data set name), you delete data set AY.
So I presume AY is created earlier in the program, i.e. it's not a permanent data set. If so, then you might save some resources by declaring AY as a data set VIEW rather than a data set FILE (i.e. DATA AY/ view=AY; instead of DATA AY;).. This can speed things up, because AY is never written to disk. Instead its contents are dynamically generated and passed to the DATA 2A step only when the SET AY statement is encountered. If AY is not used multiple times in your program, this will likely save a lot of resources and time ordinarily needed to write data to disk only to re-read and discard it.
My suspicion is that disk or network I/O is the bottleneck. If this data step runs for 24+ minutes AND source ay is a table and not a view then it must be high volume - really a lot of rows and variables and/or some character variables with a big length.
- Turn on option FULLSTIMER and share the log with us.
- Run a Proc Contents on ay and share the result with us.
- Drop all variables you can from input data set ay - in the set statement something like: set ay(keep=....);
- Set option COMPRESS=YES at the start of your program and test if this improves runtimes
Below some tweaks to the code you've posted. Not sure if this will improve execution time.
data _2a (drop=ty amt _ind);
set ay;
by num no no1;
array a_vars {*} ccb cles nut amp ss rat xcv wee ler mit;
_ind=WHICHC(ty, 'AA','BB','CCC','DDD','EE','FFF','GGG','HHH','II','JJJ');
if _ind>0 then a_vars(_ind)=amt;
else if ty = 'KK' and hhhoi in ('SETL' 'SET1' 'SET2' 'SET3') then
do;
if gjhrt = '16' then ghfcv = 'SD';
else ghfcv = 'ED';
end;
retain ccb amp cles nut ler mit ss rat xcv wee 0 ghfcv " ";
if last.no1 then
do;
output;
ccb = 0;
cles = 0;
nut = 0;
amp = 0;
ss = 0;
rat = 0;
xcv = 0;
wee = 0;
ler = 0;
mit = 0;
ghfcv = " ";
end;
run;
proc datasets lib=work memtype=data nolist;
delete ay;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.