11-27-2013 01:28 AM
In order to measure performance and whether we can reduce any execution time, I am trying to convert some of SAS code into SQL query. While doing it, I came across a piece of code weight is being calculated. Need to know how SAS calculates weight for any given column so that I can try and replicate the same using SQL.
proc means data=input_file noprint;
Any insight on this would be of great help to me.
11-27-2013 03:51 AM
The formulas used are - if not obvious - normally documented as part of the documentation for the procedure.
Not sure if changing everything to SQL will help you a lot unless the data tables are stored in a data base. If stored in a data base: You might want to investigate the in-database capabilities of some of the PROCs (eg. Proc Means).
The quickest way of improving performance is in my experience to reduce passes through the data, reduction of volumes as fast as possible, re-design lookups (eg. by using hash tables).
11-27-2013 05:00 AM
Patrick, Appreciate your response and I do agree with the Performance improvement point.
But as per the assignment, I need some way to do it in SQL. If some one could provide a sample input/output with the weight function, then I can re-engineer it to see how its calculated.
Thanks & Best Regards,
11-28-2013 10:33 AM
It would take two passes through the data to get the same result in SQL, but the final formula would be:
C_A*weight/sum_of_weights as weighted_mean
where the sum of weights would need to be calculated for each period. You would need to join the summed weights back to the original data.
11-28-2013 10:54 AM
It's a weighted average
Additionally you need to account for missing and negative weights separately
|Weight value...||PROC MEANS...|
|0||counts the observation in the total number of observations|
|less than 0||converts the value to zero and counts the observation in the total number of observations|
|missing||excludes the observation|