Desktop productivity for business analysts and programmers

Query builder automatically selects only distinct values from raw data

Reply
Occasional Contributor
Posts: 10

Query builder automatically selects only distinct values from raw data

I'm trying to use the Query Builder to write a percentile (PCTL) function pseudo-manually, where I want to use a column of data from my data set in the expression (since the PTCL function requires the raw data from which you want to calculate the percentile). [Note: I'm finding the 25th percentile in this example, which I know can be done in other ways in EG, but I am wondering about this in general for percentiles not "offered" by Summary Statistics in the Tasks > Describe menu. It also applies to using the raw data in the Query Builder in general.]

To do this, I do the following:

  • Start at the Input Data window
  • Select Query Builder
  • Drag my variable of interest (column called Length) over to to the Select Data pane in the Query Builder (shown below)

query01.jpg

  • Select the "Add a New Computed Column" button (that looks like a calculator)
  • Select radio button next to "Advanced expression" and click Next
  • I manually type in the "Enter an expression:" pane: PCTL(25,
  • Now I need to get my raw data (n = 46) into the PCTL function, separated by commas, so in the lower left pane, I double-click "Selected Columns" to see my variable length drop down here (see image below)
  • Then, I see this message in the pane to the right: "The maximum number of rows to process for retrieving distinct values may be limited"
  • When I click "Get Values," sure enough - it only pulls the distinct values from my variable Length, so when I then "Select Values" to insert them in the PCTL function, I don't have all my data, only the distinct values (and no, I did not select the "Select distinct rows only" box in the original Query Builder box"). In this case, I have n = 40 distinct values, so those are the only values that get inserted into the PCTL function.

Query.jpg

I can't have this if I want to calculate something like a percentile - I need all of the data, not just distinct values! Why does it do this? Is there a way to change this?

Any help would be greatly (greatly) appreciated.

Grand Advisor
Posts: 17,464

Re: Query builder automatically selects only distinct values from raw data

I think you're using it in a way not intended. That list is intend for WHERE or IF clauses so having a unique list makes sense and I doubt there's a way to change it.

I wouldn't recommend this method of calculating percentiles, it would be difficult to maintain, explain or follow for anyone else.

Occasional Contributor
Posts: 10

Re: Query builder automatically selects only distinct values from raw data

It seems unreasonable to ask Query Builder to insert raw data as an argument in a function?

Grand Advisor
Posts: 17,464

Re: Query builder automatically selects only distinct values from raw data

Not intended and unconventional.  Generally, the purpose is to reference data sets, and variables not to include the raw data.

If you want to go this route add a step that selects all the values into a macro variable and use that in your function.

Occasional Contributor
Posts: 10

Re: Query builder automatically selects only distinct values from raw data

Okay. Thanks for letting me know. I'm not an EG user (I code in SAS), but am trying to use it for my intro stats class and it just seems intuitive to me that when you pull a variable over into the Query Builder that it should use that data as-is, or at least allow that option (especially when all functions, including those that require raw data, such as PCTL, are included in its menu in the Advanced Expression window), but it appears I don't understand the purpose of the Query Builder holistically.

I'm going to skip introducing them to Query Builder at this point and just wait until we use functions that don't require raw data (like finding p-values based on a test statistic from a known distribution).

Grand Advisor
Posts: 17,464

Re: Query builder automatically selects only distinct values from raw data

Query Builder essentially builds SQL code.

I'm sure you know, but some functions in SAS, such as PCTL, MEDIAN don't work in SAS SQL on a variable, they work across rows.

You could transpose the data (Transpose Step) and then use the values that way though it seems like more work than writing some SAS code, i.e. proc univariate.

Occasional Contributor
Posts: 10

Re: Query builder automatically selects only distinct values from raw data

That explains it. I didn't realize it was building SQL code per se. Thanks.

Ask a Question
Discussion stats
  • 6 replies
  • 2084 views
  • 8 likes
  • 2 in conversation