- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
dataset k;
input a b c d;
datalines;
1 2 4 4
1 3 3 4
2 3 4 4
so on ;
run;
data k;
set k;
if( a=xyz and b=lmn and c=hjk and d=cde) then;
flag1=x1;
flag2=x2;
flag3=x3;
else;
flag1=y1;
flag2=y2;
flag3=y3;
run;
I am worried about the execution time of the above. Could anybody help me optimizing the above problem. One solution I have is to index the four columns. Please provide some useful solution to optimize it in best possible manner.
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1M records 4 numeric of length 8 bytes that is 16Mbyte having 16 numeric 16Mbyte something so small it will be cached in memory. Nothing to bother about. Do you have specs that is more big?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
if dataset has 4 millions row then there will 128Mbyte data to cached and I don't know how well will SAS be able to tackle it with its cached memory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Check your hardware settings, even UE is coming with the option set on about 2-3Gb. Your desktop is coming with at least 4Gb and can be easily 12G. Mostly server sessions even far higher.
With those low values (128 Mb) the IO OS systemcache will do. Optimizing buffersize at eg 128kb and there will a 1000 buffers needs. Unless you are needing split second response time looking not a real issue.
Your selection looks simple (not complicated) your data looks simple (not complicated) unless you are having some challenges there I do not see a lot to optimize with all sequential processing on all data (fastest was)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like Jaap said, 4 million is NOT a large dataset for SAS.
I'd be more concerned about whether your code is doing what you want. You have:
if( a=xyz and b=lmn and c=hjk and d=cde) then;
flag1=x1;
flag2=x2;
flag3=x3;
else;
flag1=y1;
flag2=y2;
flag3=y3;
What do xya, lmn, hjk and cde represent? If they're not variables, the above code will fail.
Also, the code doesn't make sense as written. Don't you really want something like?:
if( a=xyz and b=lmn and c=hjk and d=cde) then do;
flag1=x1;
flag2=x2;
flag3=x3;
end;
else do;
flag1=y1;
flag2=y2;
flag3=y3;
end;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Even 4 million records with 4 numeric variables is EXTREMELY SMALL for SAS. I'd say that at this time, "big" starts at files well above 10 GB in size.
If you have trouble, show the log of your data step with fullstimer enabled.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Forgetting the fact that your dataset is too small for a significant performance improvement in SAS , fix your code by including where clause instead of if statements, which will give you a better performance gain theoretically.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do all records receive the same set of flags when match occurs and similarly other set of flags when match fails? Do the flags are CONSTANT?