In the third maintenance release for SAS 9.4, WHERE processing optimization is expanded. Using the Base SAS SPD Engine with Hadoop, you can request that data subsetting be performed in the Hadoop cluster, which takes advantage of the filtering and ordering capabilities of the MapReduce framework. As a result, only the subset of data is returned to the SAS client.
By default, data subsetting is performed by the SPD Engine on the SAS client. To request that data subsetting be performed in the Hadoop cluster, you must specify the ACCELWHERE= LIBNAME statement or the ACCELWHERE= data set option.
WHERE processing optimization supports the following syntax:
comparison operators such as EQ (=), NE (^=), GT (>), LT (<), GE (>=), LE (<=)
IN operator
full bounded range condition, such as where 500 <= empnum <= 1000;
BETWEEN-AND operator, such as where empnum between 500 and 1000;
compound expressions using the logical operators AND, OR, and NOT, such as where skill = 'java' or years = 4;
parentheses to control the order of evaluation, such as where (product='GRAPH' or product='STAT') and country='Canada';
For the complete documentation about WHERE processing optimization and the data set and SAS code requirements, see WHERE Processing Optimization with MapReduce.
... View more