dataset k;
input a b c d;
datalines;
1 2 4 4
1 3 3 4
2 3 4 4
so on ;
run;
data k;
set k;
if( a=xyz and b=lmn and c=hjk and d=cde) then;
flag1=x1;
flag2=x2;
flag3=x3;
else;
flag1=y1;
flag2=y2;
flag3=y3;
run;
I am worried about the execution time of the above. Could anybody help me optimizing the above problem. One solution I have is to index the four columns. Please provide some useful solution to optimize it in best possible manner.
Thanks in advance.
1M records 4 numeric of length 8 bytes that is 16Mbyte having 16 numeric 16Mbyte something so small it will be cached in memory. Nothing to bother about. Do you have specs that is more big?
if dataset has 4 millions row then there will 128Mbyte data to cached and I don't know how well will SAS be able to tackle it with its cached memory.
Check your hardware settings, even UE is coming with the option set on about 2-3Gb. Your desktop is coming with at least 4Gb and can be easily 12G. Mostly server sessions even far higher.
With those low values (128 Mb) the IO OS systemcache will do. Optimizing buffersize at eg 128kb and there will a 1000 buffers needs. Unless you are needing split second response time looking not a real issue.
Your selection looks simple (not complicated) your data looks simple (not complicated) unless you are having some challenges there I do not see a lot to optimize with all sequential processing on all data (fastest was)
Like Jaap said, 4 million is NOT a large dataset for SAS.
I'd be more concerned about whether your code is doing what you want. You have:
if( a=xyz and b=lmn and c=hjk and d=cde) then;
flag1=x1;
flag2=x2;
flag3=x3;
else;
flag1=y1;
flag2=y2;
flag3=y3;
What do xya, lmn, hjk and cde represent? If they're not variables, the above code will fail.
Also, the code doesn't make sense as written. Don't you really want something like?:
if( a=xyz and b=lmn and c=hjk and d=cde) then do;
flag1=x1;
flag2=x2;
flag3=x3;
end;
else do;
flag1=y1;
flag2=y2;
flag3=y3;
end;
Even 4 million records with 4 numeric variables is EXTREMELY SMALL for SAS. I'd say that at this time, "big" starts at files well above 10 GB in size.
If you have trouble, show the log of your data step with fullstimer enabled.
Forgetting the fact that your dataset is too small for a significant performance improvement in SAS , fix your code by including where clause instead of if statements, which will give you a better performance gain theoretically.
Do all records receive the same set of flags when match occurs and similarly other set of flags when match fails? Do the flags are CONSTANT?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.