EfEfficiency:
If the dataset has one million of observations, which one of the following program is more efficient in terms of reducing the cpu time.
Partial dataset:
Dataset:
Var3 Seq1 var1 var2
US 100 10 1
EU 200 20 2
IND 300 30 3
Program No.1
Data US IND EU;
Set dataset;
If var3=‘US’ then output US;
If var3=’EU’ then output EU;
If var3=’IND’ then output IND;
Run;
Program no. 2
%macro report(country);
Data &country;
Set dataset;
If var3= ‘’&country ‘’ then output &country;
Run;
%mend report;
%report(US)
%report(EU)
%report(IND)
I expect program 1 would be faster because it reads the input only once.
Both programs write the same three dataset.
The following logic is quicker also. And if you put the IF statements in order of most frequent then that is faster again (US has more rows than EU and EU has more rows than IND).
If var3=‘US’ then output US;
else If var3=’EU’ then output EU;
else If var3=’IND’ then output IND;
Program no 2 shouldn't be allowed, anywhere! A single if with an output... Always use WHERE in such situations.
Agree with SASkiwi, based on the limited information in the post.
Is this a real use case, or just an educational question?
If the data set (or DBMS table) has many more values for var3 than the mentioned, a macro approach with WHERE could be more efficient, given that var3 is indexed, or the table is partitioned (i.e. Oracle) or clustered (SPDS) by var3.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.