I'm trying to use the Query Builder to write a percentile (PCTL) function pseudo-manually, where I want to use a column of data from my data set in the expression (since the PTCL function requires the raw data from which you want to calculate the percentile). [Note: I'm finding the 25th percentile in this example, which I know can be done in other ways in EG, but I am wondering about this in general for percentiles not "offered" by Summary Statistics in the Tasks > Describe menu. It also applies to using the raw data in the Query Builder in general.] To do this, I do the following: Start at the Input Data window Select Query Builder Drag my variable of interest (column called Length) over to to the Select Data pane in the Query Builder (shown below) Select the "Add a New Computed Column" button (that looks like a calculator) Select radio button next to "Advanced expression" and click Next I manually type in the "Enter an expression:" pane: PCTL(25, Now I need to get my raw data (n = 46) into the PCTL function, separated by commas, so in the lower left pane, I double-click "Selected Columns" to see my variable length drop down here (see image below) Then, I see this message in the pane to the right: "The maximum number of rows to process for retrieving distinct values may be limited" When I click "Get Values," sure enough - it only pulls the distinct values from my variable Length, so when I then "Select Values" to insert them in the PCTL function, I don't have all my data, only the distinct values (and no, I did not select the "Select distinct rows only" box in the original Query Builder box"). In this case, I have n = 40 distinct values, so those are the only values that get inserted into the PCTL function. I can't have this if I want to calculate something like a percentile - I need all of the data, not just distinct values! Why does it do this? Is there a way to change this? Any help would be greatly (greatly) appreciated.
... View more