Dear SAS Folks, I request your help for an efficient solution for a very large dataset that challenges to compare to By-groups within by groups.
Here is the sample of my input data-set(have) with 5 columns: STORE_ID PRODUCT_ID PRODUCT_WIN_POINT1 PRODUCT_WIN_POINT2 MONTHLY_PERIOD W169 F1 1 0 jan W169 F1 0 1 feb W169 F1 0 0 mar W169 F1 1 1 apr W169 F1 0 0 may W169 F1 1 1 jun W169 V1 0 1 jan W169 V1 1 1 feb W169 V1 1 0 mar W169 V1 1 0 apr W169 V1 0 1 may W169 V1 0 1 Jun W169 T1 0 1 jan W169 T1 1 0 feb W169 R1 0 1 mar W169 R1 0 0 may W169 R1 0 1 jun B291 B3 1 0 jan B291 B2 1 1 jun B291 B44 0 1 july
The need is to compare PRODUCT_ID values against MONTHLY_PERIOD within STORE_ID values. Details: Point1: For each STORE_ID, I need to compare PRODUCT_ID F1(product_win_point1 value) to the V1(product_win_point1 value) to the T1(product_win_point1 value) and similarly across all product_id's within a store_id that has values for the month of Jan. In the example above, you would notice there is no product_win_point1 and product_win_point2 value of R1 for the month of Jan.
Point2: Likewise, feb (product_win_point1) value of product id's needs to be compared as explained in point1. And the same exercise should be done for all months available, that is jan to jan, feb to feb, mar to mar........
Point3: Comparison logic--> For example, When comparing the values of product_win_point1 across months, let's say taking the example above: For store_id W169,for the month of Jan, the product_win_point1 value for product_id F1=1, V1=0, T1=0,R1 is missing. Since there is at-least one product_win_point1 with value 1, the result should be stored as 1 for the month Jan. Likewise the same rule applies when comparing product_win_point1 values of product id's for feb against feb, mar against mar......
Point4: The same comparison logic as mentioned in Point1,2 & 3 has to be applied for PRODUCT_WIN_POINT2 values of product_ids against monthly periods.
I trust the points explained above can be comprehended and if not I am so keen to detail further whatever i can.
So, the final resulting WANT(Output) dataset:
WANT: STORE_ID MONTHLY_PERIOD MONTH_as_number COMPARED_PRODUCT_WIN_POINT1 COMPARED_PRODUCT_WIN_POINT2 W169 jan 1 1 1 W169 feb 2 1 1 W169 mar 3 1 1 W169 apr 4 1 1 W169 may 5 0 1 W169 jun 6 1 1 B291 jan 1 1 0 B291 jun 6 1 1 B291 july 7 0 1
... View more