BookmarkSubscribeRSS Feed
sasthalie
Fluorite | Level 6

Hi,

My data looks like this: 

patient_idstart_date_name__label_col1level
116/01/2021level1level111
217/05/2020level3level313
321/02/2021level1level111
415/05/2020level3level313
513/06/2020level3level313
628/03/2020level3level313

 

I created a stacked histogram using this code: 

proc sgplot data=level_data;
histogram start_date/ group=level;

 

Each line is a patient with a date and a level. The histogram shows: x=start_date, y=number of patients, with stacks for each level (1 to 4) (see document). I have 2 problems: 

 

1) Since my data is sorted by patient_ID and start_date, the stacks are in this order, from up to down: 1-3-2-4. I would like them to show up as, from up to down : 4-3-2-1. I have tried sorting the data by level but it doesn't give me the result I expect on the graph.

2) For x=April on the graph, the level1 group shows 95 patients, which I have confirmed in my data. However, the stack's height is obviously out of proportion ( too short), compared to the other groups levels of the same month. Why is that? I have tried increasing the y axis scale to 300, but the visual proportion of this group doesn't change. 

 

Thank you

4 REPLIES 4
ballardw
Super User

Histogram, if you do not use some options to control the range of x values in a single bar, will build bars using internal rules and apparently in this case do not match your desire. Your group variable only affects the resulting stacks after the width of values in a bar are determined. Look at options NBINS or BINWIDTH to control that width of x values in each bar.

 

Since your code does not show any way that you are creating "April" on the axis the problem may be more complicated, or possibly all you need to do is provide a correct format. With none shown hard to tell.

 

Without complete data, in the form of a data step, I can't supply specific suggestions because the complete range of your values would be needed.

 

Many users here don't want to download Microsoft files because of virus potential, others have such things blocked by security software.

 

sasthalie
Fluorite | Level 6
proc sort data=A;
by patient_site_uid;

proc transpose data=A
out=long(where=(col1 ne 0));
by patient_ID start_date;
var level:;
run;

data level_data;
set long;
level = substr(compress(reverse(_name_)), 1, 1);
run;

/*histogram*/
proc freq data=level_data;
table start_date * level / nopercent norow nocol;
run;

proc sgplot data=level_data;
histogram start_date/ group=level datalabel=count scale=count;
run;


Here's my code, hope it helps. I'm trying to limit what I share for confidentiality reasons. 

I attached a pdf version of the histogram, in case that's better. 

-I didn't create "April" on the x axis. The proc sgplot did it directly, from my start_date variable (format mmddyy)

-I'm not sure I understand how I should modify the x axis bins to change the groups' order. From my understanding, SAS choose to display the groups in the level 1-3-2-4 order (up to down) because that's the order they show up in my data (see data in my first message). The data is sorted by patient ID and start_date, not by level. When I tried also sorting the data by level, it doesn't work since the x-axis need the data to be sorted by start_date. 

 

Thank you!

ballardw
Super User

Since your plot has nothing to do with patient id, sort it by start_date and level prior to graphing. If you don't have any level 1 for the earlier start dates then you may need something else.

 

Since your Plot only uses start_date and level, I do not see that sharing those variables violates any confidentiality.

 

Do you actually have a preference for what the horizontal groups should be? Histograms without controls on the axis width are quite often going to use bins that don't make sense for some purpose. Such as matching a particular time interval total. The axis tick marks also typically are not where you expect with dates.

 

You really need to look at the options I mentioned for width control.

From the documentation:  "If neither BINWIDTH= nor the NBINS= option is specified, the system determines the number of bins.". When dealing with date values that "number of bins" will ignore the nature of dates until a label is placed on the tickmarks for the axis.

 

I suggest as a minimum to get a slightly better handle on what your xaxis is telling you is to use the SHOWBINS option. That way the tick marks placed on the graph will be in the middle of the range for the bin.

 

If you are interested in what happens in a given month then you want to use a VBAR not histogram and a format like MONYY so the bars actually correspond to months. Otherwise with histogram you are almost never going to get monthly xaxis grouping because that is not the purpose of histogram.

 

 

 

sasthalie
Fluorite | Level 6

-The issue when I sort by start_time level is that the first level is 3. If I sort by level only, then the x axis doesn't make sense (start_time is out of order). 

-I don't necessarily need the horizontal bins to be a of a specific category. I was told histograms are more fit for continuous variables like time, if I'm not grouping time. The group order that bothers me most is the order of the stacks on each bands (level groups). 

-I have tried playing with the bandwith and it seems to help with the display of each stacks. 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1397 views
  • 0 likes
  • 2 in conversation