I have a dataset with about 100 groups, and I need a histogram of "Number of Cycles" for each group; however, I only want to display values of "Number of Cycles" that are greater than Q3. Is there a convenient way to do this?
Please show an example of your data and the code you have so far. By Q3, do you mean third quartile / 75th percentile? If so, then you could calculate the 75th percentile (by group) using PROC MEANS or whatever and merge it onto your data (by group), then use a where statement Where Var>Var_Q3. But that's just a wild guess, without seeing sample data.
Appreciate the help. My variable of interest is not a date; it takes on values from 0 to 1500. Also the Q3 will be different for each group, and I believe a where statement will apply universally to all groups.
Here is an example of the data structure. Each group could have as many as 1000 different values of "Number of Cycles", so each group will have a different Q3. Thus, I am looking to create a separate graph for each group, but the histogram will only display values that are greater than Q3 for that particular group. I apologize if my original question was to ambiguous.
Group | Number of Cycles |
Group 1 | 1 |
Group 1 | 9 |
Group 1 | 2 |
Group 1 | 1 |
Group 1 | 10 |
Group 1 | 5 |
Group 2 | 1 |
Group 2 | 2 |
Group 2 | 3 |
Group 2 | 45 |
Group 3 | 2 |
Group 3 | 5 |
Group 3 | 6 |
Group 3 | 4 |
Group 3 | 5 |
Group 3 | 3 |
Please show an example of your data and the code you have so far. By Q3, do you mean third quartile / 75th percentile? If so, then you could calculate the 75th percentile (by group) using PROC MEANS or whatever and merge it onto your data (by group), then use a where statement Where Var>Var_Q3. But that's just a wild guess, without seeing sample data.
Use PROC MEANS to get the values of the Q3 in each group. They merge the Q3 values with the data and output only the values that exceed Q3:
data Have;
input Group N;
datalines;
1 1
1 9
1 2
1 1
1 10
1 5
2 1
2 2
2 3
2 3
2 4
2 6
2 20
2 45
3 2
3 5
3 6
3 4
3 5
3 3
;
proc means data=Have noprint;
by Group;
var N;
output out=Q3Out Q3=Q3;
run;
data Want;
merge Have Q3Out;
by Group;
if N > Q3;
run;
proc print data=Want;
var Group N Q3;
run;
I don't fully understand if you want a histogram for each group or if you want one histogram that combines all the data across groups. I think the first. If so, you can use PROC SGPANEL and PANELBY Group, or if you want many small histograms (not paneled), use PROC SGPLOT and BY Group.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.