Hello our esteemed advisers and community,
I am seeking assistance about plotting graph for large data sets. I have very big data set with over 100,000,000 observations but I have tried plotting these calibration graphs and SAS returns an error of maximum observations exceeded. I tried using the option of changing maximum observations so that I can plot but still I cannot draw beyond 2,000,000 observations.
I will be glad to be advised on how to handle this problem.
I would not like to download DOC file from website due to virus.
Check PROC KDE's documentation. Maybe you could find an option you need.
" if possible place a line at 45 angle to indicate the straight line for a perfect calibration so that it clearly shows the devaition from the line. "
I would suggest save the output as a dataset and plot it with PROC SGPLOT + LINEPARM statement.
ods trace on;
proc kde......
ods trace off;
could give you the NAME of output. and save it as a dataset like:
ods output xxxxxx=xxxxxxx;
proc kde..............
With that many observations, you should probably use a HEATMAP instead of a SCATTER plot. Compare the output from these two outputs:
proc sgplot data=sashelp.heart;
heatmap x=weight y=height;
run;
proc sgplot data=sashelp.heart;
scatter x=weight y=height;
run;
The HEATMAP statement will process the data in a way that will greatly reduce the actual number of observations processed by the rendering code.
Hope this helps!
Dan
Or check PROC KDE + BIVAR statement.
Hello my advisors,
The Heat map and scatter also return the same messages of exceeded 2000000 observations. I have tried PROC KDE + BIVAR statement and I says out of memory but the procedure had started running.
This out of memory problem has also prevented Heat map approach.
I will be glad for additional advise.
Could you make your SAS session has more memory by options -memsize 0
proc kde data=octane;
bivar Rater Customer / plots=CONTOUR;
run;
or use OUT= save the dataset and plot it later.
ods select none;
proc kde data=octane;
bivar Rater Customer / out=want;
run;
ods select all;
Thank you Ksharp for the assistance.
The procedure below works and I have been able to draw one of the graphs from a large dataset.
proc kde data=octane; bivar Rater Customer / plots=CONTOUR; run;
There is one more assitance am requesting you to help me. The graph produced has a black back ground and I would like to change the back ground to other colors like white and make the plotted points clear. And also if possible place a line at 45 angle to indicate the straight line for a perfect calibration so that it clearly shows the devaition from the line.
I have attached the output graph on this reply, I request you to check it and advise me on how to improve its appearance.
I will be glad if am assisted to make it look better.
I would not like to download DOC file from website due to virus.
Check PROC KDE's documentation. Maybe you could find an option you need.
" if possible place a line at 45 angle to indicate the straight line for a perfect calibration so that it clearly shows the devaition from the line. "
I would suggest save the output as a dataset and plot it with PROC SGPLOT + LINEPARM statement.
ods trace on;
proc kde......
ods trace off;
could give you the NAME of output. and save it as a dataset like:
ods output xxxxxx=xxxxxxx;
proc kde..............
Although this one has been solved, I wanted to mention another solution.
You could use SAS/Graph Proc Gplot - it doesn't seem to have a limit to the number of observations plotted.
For example, I created a random dataset with 100,000,000 observations, and used semi-transparents points as my plot marker, and came up with the following plot.
I ran this on a Linux machine, and lifted the default SAS memory size limit by starting the SAS session with "sas -memsize 0 &" ... and it took, umm, "quite a while" to run ... cpu time 3:55:37.78 (but it did run!)
data my_data (keep = x y);
seed1=12345;
seed2=98765;
do count=1 to 100000000;
call rannor(seed1,x);
call rannor(seed2,y);
output;
end;
run;
symbol1 value=point interpol=none color=A00000033;
proc gplot data=my_data;
plot y*x=1;
run;
Just to see what is going on, I ran Robert's data with SGPLOT, using some options to allow larger data (maxobs=100000000).
The SCATTER plot still did not finish due to lack of Java memory.
However, the HEATMAP did run with default SAS startup settings and the graph completed in 02:20:63 of real time on a Windows box. The HeatMap is a better visual, as you can actually see the distribution of the data instead of a big black blob. :-). Binning is done on the server, and only the resulting bins are sent to the Java renderer for drawing. I used 320x240 bins as there is no point in creating bins < a pixel. This gives 2x2 pixel bins. Image attached.
data my_data (keep = x y);
seed1=12345;
seed2=98765;
do count=1 to 100000000;
call rannor(seed1,x);
call rannor(seed2,y);
output;
end;
run;
ods html close;
ods listing gpath='C:\temp';
ods graphics / maxobs=100000000 NXYBINSMAX=80000 imagename='BigData';
proc sgplot data=my_data;
heatmap x=x y=y / nxbins=320 nybins=240 colormodel=(green yellow red);
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.