Hi,
I have a general question about the dataset that is read into a graphic procedure like SGPlot.
When building a visualisation across several datasets (for example AE, Dosing plus reflines) do you have to build one highly denormalised dataset that spans all the source domains or is their an ability to join the source datasets within the proc statement?
I come from a database build perspective and dislike the idea of building the highly denormalised dataset just to feed it into a proc statement. From my point-of-view it would be an advantage if the "data=" assignment could reference a SQL join statement. I understand that a work area temp table can be built for this purpose but for me this isn't a 'good practice' method.
If anyone has seen a white paper that provides background in this area of building visualisations please let me know.
Thanks for your assistance. (Using SAS 9.4)
Here are two other techniques (with traditional SAS/Graph), which I often use in complex graphs and dashboards that need to pull data from multiple sources, that don't require combining the data all into 1 dataset ...
I often use an annotate dataset to overlay layers onto a graph (a simple example would be overlaying a line on a bar chart).
And I often create several individual charts from single datasets, and then use Proc Greplay to overlay them together (either on top of each other, or tiled as a dashboard).
The SGPLOT procedure leaves the data manipulation to SAS. SAS provides many procedures and the Data Step for you to combine your data in the way you want. SGPLOT only plots the data. For your specific issue, I presented a paper last year on how to combine the data to produce a combined AE+CM graph. You could leverage the ideas to combine other data over a common time line.
I come from a database build perspective and dislike the idea of building the highly denormalised dataset just to feed it into a proc statement.
You could create a view instead but to be honest standard star or snowflake schemas don't provide data in a form that can be used for statistical analysis or machine learning, in general. So rather than a lot of views analysts generally prefer that highly denormalized data set for analysis.
At least SAS uses standard table structure, the other languages, D3 and JavaScript libraries like JSON and XML and those make my head hurt.
SAS graphing procedures read data from a single table. You can avoid creating a temporary table with a SQL view which is run only when the graphic proc is executed.
Do you have an example of a SG procedure driven by a view? Would the view be linked to the "data" statement?
I assume using GTL will also give you more control over the input dataset but for the moment I would like to understand the SG procedures.
GTL still needs only one data set, you use SGRENDER to create the graph in the end and it also only handles one data set.
Useless example here:
data demo / view=demo;
Set sashelp.class;
Run;
Proc sgplot data=demo;
Scatter x=height y=weight;
Run;
Here is an example combining two SASHELP datasets into a SQL view
proc sql;
create view allBMI as
select
"Heart Study" as source,
ageAtStart as age,
(weight*0.45359237)/(height*0.0254)**2 as BMI
from sashelp.heart
union all
select
"NCHS",
age,
BMI
from sashelp.bmimen;
quit;
proc sgplot data=allBMI;
scatter y=bmi x=age / group=source markerattrs=(size=3) jitter;
run;
Here are two other techniques (with traditional SAS/Graph), which I often use in complex graphs and dashboards that need to pull data from multiple sources, that don't require combining the data all into 1 dataset ...
I often use an annotate dataset to overlay layers onto a graph (a simple example would be overlaying a line on a bar chart).
And I often create several individual charts from single datasets, and then use Proc Greplay to overlay them together (either on top of each other, or tiled as a dashboard).
Thanks for the overlay idea. It sounds like it may get around the requirement to denormalise datasets.
In one of my graphs I have a dataset that is used to create reflines - the number of lines can be anything between 0 through 15 per plot. So currently i am joining this refline dataset with the "main" dataset and have 15 denormalised columns (one for each refline.) It sounds that using this overlay approach I dont need to join the datasets.
I will look into your 2 techniques - hope they arent too difficult to follow!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.