1. All the comments made before are entirely valid.
The most valid one is: show us more.
If you call a garage and state that your car runs poorly and don't give more details, the chances of a useful diagnostic are zero.
2. You are showing 2 steps here. How long do they take?
To repeat the message you seem to stubbornly ignore: where's the log?
3. This code hardly does anything. Show us the log please. It looks like you must have slow disks if this simple logic takes time.
3. If you have slow disks, large (and small) tables are best stored in SPDE format with binary compression turned on. This lowers disk access, and also allows on-the-fly sorting, which is rather efficient.
Since you mention sorting as an issue (but then you show us a data step as an example of a slow step for some reason) this could be a double-win.
4. If you have data steps doing so little processing, it is possible (and common) that the code multiplies time-consuming baby steps instead of having fewer smarter steps.
6. How long does the proc datasets take? If you have slow libraries (read RDBMS, or very sadly SPDE), this deletion could take a while for no good reason. Replace the proc datasets step with:
proc delete data=AY; run;
HI @ChrisNZ ,
Previously i thought proc sort with out really seeing the full code.
and then i followed as @kurt suggested and found that the pasted data step taking more time compared with others sections of code please. thanks for the practical advise which i can easily understand
log part has PII info. how can i share please. how can i mask. the logs is very big please adivse
NO RDMS pure SAS tables.
Actualy i can share whole code but this has PII info.to mask i think manually it will take more time. please suggest if you have any options for me so that i can share log along with the code and to get the best help from you please
As ChrisNZ said, try option and spde engine(which have some drawback ) .
1)
options bufno=100 bufsize=128k compress=yes threads cpucount=4 ;
2)
libname x spde 'c:\temp';
If your data step throws lots of messages to the log, this can be part of the problem. If that is the case, sanitize your data steps(s) so that only the essential NOTEs are displayed:
NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.CLASS has 19 observations and 5 variables. NOTE: Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit): real time 0.00 seconds cpu time 0.00 seconds
A data step that comes back with more than these needs to be fixed. No "invalid data", "numeric converted to character", "character converted to numeric", "missing values because of ..." allowed.
If in doubt, show us the summary lines of your long-running data step; these should not contain critical information (you may want to mask the dataset names).
Thanks @Kurt_Bremser . please have a look on code snippet and log below taking nearly starting from 02:49:17 AM and ending at 07:04:57 AM to complete.
data ytregwve (drop=uhndshgdyhrg werrwerrtyu);
set qaszxdc;
by gggggg rererrt trwevds;
if uhndshgdyhrg = 'TESTETGED' then AAA = werrwerrtyu;
else if uhndshgdyhrg = 'ERHWGRB' then fdgfgd = werrwerrtyu;
else if uhndshgdyhrg = 'TGBE' then clcpc = werrwerrtyu;
else if uhndshgdyhrg = 'YHHGE' then lalal = werrwerrtyu;
else if uhndshgdyhrg = 'TGBV' then ewrwe = werrwerrtyu;
else if uhndshgdyhrg = 'YUHR' then clexgrat = werrwerrtyu;
else if uhndshgdyhrg = 'EDRF' then clmanxs = werrwerrtyu;
else if uhndshgdyhrg = 'YUJF' then dftg = werrwerrtyu;
else if uhndshgdyhrg = 'UIWER' then trtrtr = werrwerrtyu;
else if uhndshgdyhrg = 'UIKJE' then fdffd = werrwerrtyu;
else if uhndshgdyhrg = 'IKERDF' and xxx in ('YHHGE' 'YUYEW' 'UHYER' 'UHYT') then do;
if rterttjki = '1' then ewr = 'SD';
else ewr = 'WE';
end;
retain AAA lalal fdgfgd clcpc trtrtr fdffd
ewrwe clexgrat clmanxs dftg 0 ewr " ";
if last.trwevds then do;
output;
AAA = 0;
fdgfgd = 0;
clcpc = 0;
lalal = 0;
ewrwe = 0;
clexgrat = 0;
clmanxs = 0;
dftg = 0;
trtrtr = 0;
fdffd = 0;
ewr = " ";
end;
run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
1672:14
NOTE: There were 181103546 observations read from the data set WORK.qaszxdc.
NOTE: The data set WORK.ytregwve has 25969878 observations and 67 variables.
NOTE: Compressing data set WORK.ytregwve decreased size by 35.41 percent.
Compressed is 335475 pages; un-compressed would require 519399 pages.
NOTE: DATA statement used (Total process time):
real time 24:39.42
user cpu time 2:24.38
system cpu time 1:03.03
memory 3948.53k
OS Memory 13348.00k
Timestamp 01/06/2020 04:47:32 o'clock
Page Faults 675259
Page Reclaims 124349
Page Swaps 0
Voluntary Context Switches 1887
Involuntary Context Switches 57128
Block Input Operations 0
Block Output Operations 0
Directory
Libref WORK
Engine V9
105 The SAS System 02:36 Monday, June 1, 2020
Directory
Physical Name /home/healthcare/work2/SAS_workCBF90082018E_uknwsaviv764
Filename /home/healthcare/work2/SAS_workCBF90082018E_uknwsaviv764
Inode Number 487424
Access Permission rwxrwxrwx
Owner Name sasleg
File Size (bytes) 4096
Member
# Name Type File Size Last Modified
1 qaszxdc DATA 53220147200 01-Jun-20 04:22:42
2 ytregwve DATA 8244641792 01-Jun-20 04:47:32
First test. Replace:
data ytregwve (drop=uhndshgdyhrg werrwerrtyu);
set qaszxdc;
with:
%let wdir=%sysfunc(pathname(WORK)); libname W spde "&wdir" partsize=100g compress=binary; data W.ytregwve (drop=uhndshgdyhrg werrwerrtyu); set qaszxdc(bufno=100);
and tell us the results.
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
1672:14
What's on line 1672?
I see that you have real time of 24 minutes, but CPU time of only 3.5 minutes. This points to either insufficient (in terms of throughput and access times) or congested storage for your WORK location. Or there simply were so many jobs running at the same time that you only got a small piece of the server for your job.
Get in touch with your server admins to check out what is actually happening (too many jobs vs. lots of wait states).
Thanks All for pointing me in right direction.
I learnt many things through all of you. thanks.
Please report your results.
What have you tried and what results did you get?
I did not want your whole code, I just wanted the log summary lines of your single longest-running step.
When wetware is insufficient sometimes it is more cost effective and timely to improve the hardware. More RAM, More SSD, More CPU, More Ghz
regarding performance of the step that was presented there are three stages where the code is slow.
1. If conditions: This is addressed by using arrays to some extent but not completely eliminated because of the KK scenario that is more complex
2. first. and last. processing is a tad bit slower (see if proc means/summary can be utilized for aggregating the new fields created by the by group statement and the new character fields created.
3. proc datasets or delete might be time consuming. In order to delete the data set AY. You could use a silly trick like
data AY;
a=.;
run;
This will delete all the data in AY.
Hope this helps
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.