03-31-2014 11:42 PM
This is my first post here, and from what I have read you guys are really helpful and I would appreciate some help with the issues I am facing. I am new to SAS and I have been using it for around 2 months.
Basically, I have a data set with a bunch of different variables, but what I am mostly interested is a percentage change variable which can vary from say -20% to 20%. I created a histogram (with ods listing) which does that. However, the problem is that I have to plot a line which touches the top of each bar of the histogram from the median to the right(plots the height of the bin) and then I have to mirror that to the left of the median. The point is that to the left of the median I would observe lower values in general than to the right and plotting the line to the right and mirroring it on the left hand side will allow me to see how much mass is missing to the left compared to the right.
So basically, I have two questions. How can I create this line graph, which basically plots the top of each bin. I assume that I would have to create the binned data first, since I don't see how I can use my histogram code to plot this graph. So, how can I create these 1% interval bins from the data and then plot ? Can I create an if loop to do that? What would be your suggestion?
Also, how do I impose a customized graph on a histogram and not some pre-packaged density curve?
Another thing that I have to mention is that the actual data which I cannot work with has 360 million observations, so I need to make sure that everything is as efficient as possible.
I would greatly appreciate any help. Please let me know if it is not clear.
04-01-2014 11:36 AM
It would help to post an example of some of the data or dummy data or at least an example of the data structure and the code you are currently using to create the histogram.
Also, if you can post an example of what the final chart should look like.
It is likely if you are using a statistical graph out of a procedure such as Univariate that you'll have to summarize data and use another graphing procedure.
04-01-2014 08:36 PM
Here is the code I am using to create the histogram:
* Define endpoints of histogram at which to accumulate tails;
%let leftTail = -10;
%let rightTail = 10;
* Define histogram bin size;
%let binsize = 1;
* Define horizontal axis limits of histograms;
* If startHisto and endHisto are integers, then 0 will be the midpoint of a bin;
%let startHisto = -20;
%let endHisto = 20;
* Define missing-mass and zero-spike intervals;
%let a = -10; * Start point of missing-mass interval;
%let b = -5; * End point of missing-mass interval;
%let c = -0.5; * Start point of zero-spike interval;
%let d = 0.5; * End point of zero-spike interval;
* Plot histogram;
proc univariate data=temp3histo1 nextrobs=0;
histogram / midpoints= &startHisto to &endHisto by &binsize;
inset mean median std skewness p10 p90;
This is basically the structure of the variable(it is a percent change) that I want to bin in 1% intervals and then graph the height of each bin.(And I have millions of those) The other variables are irrelevant for what I am trying to do, although I want to keep all the data. Ideally, i would be able to use my histogram code to graph it, but I have no idea how. I want to preserve the bins and the tails that I have in the histogram. Data lower than or higher than the lower and upper tail respectively will accumulate there.
Do you have an idea how I may be able to do that? Also, is it even possible to combine a customized graph with a histogram?
04-02-2014 08:52 AM
I don't fully understand what line you are trying to graph, but here are two ideas that I think will help:
1) Read the article How to overlay a custom density curve on a histogram in SAS - The DO Loop That shows a GTL template that you can use to accomplish what you want.
2) You can get the midpoints of the histogram by using the OUTHIST= option on the HISTOGRAM stmt. See the beginning of this article: Bin observations by using custom cut points and unevenly spaced bins - The DO Loop
By combining these ideas, I think you will be 90% to your goal.