I'm trying to use the Query Builder to write a percentile (PCTL) function pseudo-manually, where I want to use a column of data from my data set in the expression (since the PTCL function requires the raw data from which you want to calculate the percentile). [Note: I'm finding the 25th percentile in this example, which I know can be done in other ways in EG, but I am wondering about this in general for percentiles not "offered" by Summary Statistics in the Tasks > Describe menu. It also applies to using the raw data in the Query Builder in general.]
To do this, I do the following:
I can't have this if I want to calculate something like a percentile - I need all of the data, not just distinct values! Why does it do this? Is there a way to change this?
Any help would be greatly (greatly) appreciated.
I think you're using it in a way not intended. That list is intend for WHERE or IF clauses so having a unique list makes sense and I doubt there's a way to change it.
I wouldn't recommend this method of calculating percentiles, it would be difficult to maintain, explain or follow for anyone else.
It seems unreasonable to ask Query Builder to insert raw data as an argument in a function?
Not intended and unconventional. Generally, the purpose is to reference data sets, and variables not to include the raw data.
If you want to go this route add a step that selects all the values into a macro variable and use that in your function.
Okay. Thanks for letting me know. I'm not an EG user (I code in SAS), but am trying to use it for my intro stats class and it just seems intuitive to me that when you pull a variable over into the Query Builder that it should use that data as-is, or at least allow that option (especially when all functions, including those that require raw data, such as PCTL, are included in its menu in the Advanced Expression window), but it appears I don't understand the purpose of the Query Builder holistically.
I'm going to skip introducing them to Query Builder at this point and just wait until we use functions that don't require raw data (like finding p-values based on a test statistic from a known distribution).
Query Builder essentially builds SQL code.
I'm sure you know, but some functions in SAS, such as PCTL, MEDIAN don't work in SAS SQL on a variable, they work across rows.
You could transpose the data (Transpose Step) and then use the values that way though it seems like more work than writing some SAS code, i.e. proc univariate.
That explains it. I didn't realize it was building SQL code per se. Thanks.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.