02-03-2015 12:23 PM
I'm trying to use the Query Builder to write a percentile (PCTL) function pseudo-manually, where I want to use a column of data from my data set in the expression (since the PTCL function requires the raw data from which you want to calculate the percentile). [Note: I'm finding the 25th percentile in this example, which I know can be done in other ways in EG, but I am wondering about this in general for percentiles not "offered" by Summary Statistics in the Tasks > Describe menu. It also applies to using the raw data in the Query Builder in general.]
To do this, I do the following:
I can't have this if I want to calculate something like a percentile - I need all of the data, not just distinct values! Why does it do this? Is there a way to change this?
Any help would be greatly (greatly) appreciated.
02-03-2015 12:38 PM
I think you're using it in a way not intended. That list is intend for WHERE or IF clauses so having a unique list makes sense and I doubt there's a way to change it.
I wouldn't recommend this method of calculating percentiles, it would be difficult to maintain, explain or follow for anyone else.
02-03-2015 12:43 PM
Not intended and unconventional. Generally, the purpose is to reference data sets, and variables not to include the raw data.
If you want to go this route add a step that selects all the values into a macro variable and use that in your function.
02-03-2015 12:50 PM
Okay. Thanks for letting me know. I'm not an EG user (I code in SAS), but am trying to use it for my intro stats class and it just seems intuitive to me that when you pull a variable over into the Query Builder that it should use that data as-is, or at least allow that option (especially when all functions, including those that require raw data, such as PCTL, are included in its menu in the Advanced Expression window), but it appears I don't understand the purpose of the Query Builder holistically.
I'm going to skip introducing them to Query Builder at this point and just wait until we use functions that don't require raw data (like finding p-values based on a test statistic from a known distribution).
02-03-2015 12:55 PM
Query Builder essentially builds SQL code.
I'm sure you know, but some functions in SAS, such as PCTL, MEDIAN don't work in SAS SQL on a variable, they work across rows.
You could transpose the data (Transpose Step) and then use the values that way though it seems like more work than writing some SAS code, i.e. proc univariate.