Suppose I work for a bolt-manufacturer. And for quality control we sample 3 bolts from each lot and measure their length.
I want to build a VA report that would show a simple control chart. The control chart would show a scatter plot of each lot mean (xbar_i). With control limits shown as reference lines, and control limits calculated as mean(xbar_i) +/- 3 * standard deviation(xbar_i). Code below is an example of doing this in SQL. Just compute the lot means, then compute the mean of the lot means, and standard deviation of the lot means.
Would it be possible to create this chart in VA (6.4 or even later), using the HAVE dataset of individual bolt measurements as the source data? Not asking for someone to actually make the chart for me. But just curious if computing this sort of derived variable is possible. I don't see a standard deviation aggregate measure. And not sure if I did, if I could specify this sort of grouped aggregation.
I'm sure I could compute the process control limits via SQL or perhaps even PROC SHEWHART as a preprocessor before loading the data into VA. But then I would lose VA features to allow the user to select a set of lots to include in the chart, and have the control limits recalculated using only the selected lots.
Thanks for thoughts.
data have; input lot length; cards; 1 5 1 6 1 7 2 6 2 5 2 6 3 3 3 4 3 4 ; run; proc sql; create table want as select lot ,xbar ,mean(xbar) as xbarbar ,std(xbar) as stderr ,calculated xbarbar + 3*calculated stderr as UCL ,calculated xbarbar - 3*calculated stderr as LCL from ( select lot, mean(length) as xbar from have group by lot ) ; quit; proc sgplot data=want; series x=lot y=xbarbar; series x=lot y=LCL; series x=lot y=UCL; scatter x=lot y=xbar; xaxis type=discrete; run;
StdDev and StdErr operators were added in VA 7.1:
Would something like the following expression for UCL work for you?
Avg [_ByGroup_] ('xbar'n) + ( StdErr [_ByGroup_] ('xbar'n) * 3 )
Thanks. I'll be stuck on 6.4 a few more months, so can't try this yet.
In the expression you wrote, it looks like the input variable is XBAR. Is it possible to build an expression where the input variable is LENGTH? So that it's a nested aggregation, i.e. first aggregate the LENGTH variable by lot to get lot means, then aggregate the lot means to get grand mean and standard deviation of the lot means?
So basically, is it possible to compute an aggregated metric and then use that aggregated metric in the computation of another aggregated metric? I think that's my general question.
For example, in 6.4 I could not figure a way to compute standard deviation myself. If it were possible to use nested aggregate metrics I could compute a mean and n-1, then compute the deviations from the mean etc.
I don't think you can nest aggregations in an expression. I could be wrong, but it doesn't seem to work for me.
It might be possible to calculate one aggregation as part of your data preparation and then perform the second one in your aggregated expression.
Hopefully someone else can say for sure.
Thanks again. The inability to nest aggregations is what I had feared (and seen). Just getting my feet wet with VA.
Yes, for some of what I want to do, pre-aggregating during ETL could accomplish it. But then a main benefit of VA is that you can load lots of low-level data and then let the reporting layer choose the appropriate aggregation.
As A SAS programmer, I'd be really happy if it were possible to add some old-school SAS code as a "pre-code" to a VA report, similar to how you can add pre-code to a DI job. In my head, I could use report pre-code to pre-process data (with the source data and WORK library both in memory), so could add variables that way. And then I guess would need to have an easy way to register these new tables in the metadata as temporary (?) tables, to be used only by that one report, and deleted after the report is run. Maybe that's a crazy idea. But would certainly be nice to leverage some of the power of Base SAS data manipulation (DATA step/proc SQL/so so many functions) as part of building a VA exploration/report.
You can add 'pre-process' code to a data query using the data builder (data preparation). This might be the solution that you're looking for.
Here is some information about adding pre- or post- process code:
Here is some information about creating data queries:
Thanks, but I don't think that gets me where I want to go. Starting back with my original example where I have a dataset of bolt lengths in memory on the LASR server, and I want to have a VA report that shows a control chart. In order to make the control chart, the VA report will have to compute (or read) xbar and xbarbar and stderr. I also want the user to be able to select the lots to include in the analysis. The value of stderr is dependent upon which lots are chosen.
I would think if I do this as pre-process code in the data builder it would work, but if I compute the stderr in the data builder the user would lose the ability to select lots and have stderr computed from just the selected lots. Right?
That's why I was thinking what I want is a way to have pre-code that will execute as part of the VA report. Or I suppose if I could say that when a report runs it first triggers a parameterized data builder job to run, passing it the lot selection parameters, so that the source data are updated each time a report is run. Or if it was possible to define some sort of dynamic view as a metadata object created in the data builder, then when a VA report runs have it pass parameters to that view so that the selected lots are chosen and then std error of those lots is calculated.
See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.
Find more tutorials on the SAS Users YouTube channel.