Hello,
I need to plot a 3 way venn diagram using the dataset below. I used the code detailled in the appendix of this document : https://support.sas.com/resources/papers/proceedings13/243-2013.pdf by considering only the parts related to the 3 way. Since i am a biginner in SAS, i dont understand all the macro so i have some questions :
- What does the cutoff value mean ? In my case, i dont need to set any cutoff, i need to represent the common percentages between the 3 variables
- In which part of the macro should i specify my variables ? (there are variables named A, B, C but i dont get where it is mentionned what they stand for...)
I am sharing my code that is basically all the parts i found from the paper for the 3 way. If someone can enlighten me on my questions and guide me in the changes that i have to make in my code, i would be very grateful.
Thank you for your help !
/* Venn Diagram Macro */
%macro venn(data =
,venn_diagram = 3 /* Select whether you want a 2 Way, 3 Way or 4 Way Venn Diagram. EG for 2
way enter 2. Valid values are 2,3 and 4 */
,cutoff = < 0.3 /* Set the P Value cut-off or any other appropriate cut off
Valid values are the right hand side of an if statement */
,GroupA =WBC /* Define group name 1, mandatory */
,GroupB =RBC /* Define group name 2, mandatory */
,GroupC =AT /* Define group name 3, mandatory for 3 and 4 way diagrams */
,out_location = "/home/"
/* Define the path for all output files e.g. C:\Venn Diagrams */
,outputfilename = Venn diagram Test /* Define the filename for the graphic file */
,drilldownfilename = Drilldown
/* Define the filename for the drill down data file */
);
/* Calculate the category for each observation in the dataset
This has to be done differently for 2,3 and 4 way diagrams */
data data_reformatted;
set &data;
run;
/* Counting the overlap */
data data_reformatted2;
set data_reformatted;
%if &venn_diagram = 3 %then %do;
if A ne . and B ne . and C ne . then do;
if A &cutoff and B &cutoff and C &cutoff then ABC = 1;
else ABC = 0;
end;
if A ne . and B ne . then do;
if A &cutoff and B &cutoff and ABC ne 1 then AB = 1;
else AB = 0;
end;
if A ne . and C ne . then do;
if A &cutoff and C &cutoff and ABC ne 1 then AC = 1;
else AC = 0;
end;
if B ne . and C ne . then do;
if B &cutoff and C &cutoff and ABC ne 1 then BC = 1;
else BC = 0;
end;
if A ne . then do;
if A &cutoff and AB ne 1 and AC ne 1 and ABC ne 1 then A1 = 1;
else A1 = 0;
end;
if B ne . then do;
if B &cutoff and AB ne 1 and BC ne 1 and ABC ne 1 then B1 = 1;
else B1 = 0;
end;
if C ne . then do;
if C &cutoff and AC ne 1 and BC ne 1 and ABC ne 1 then C1 = 1;
else C1 = 0;
end;
%end;
run;
/*
COUNTING THE ELEMENTS IN EACH GROUP
After the Macro identifies the elements in each group it uses PROC UNIVARIATE
to sum up the number of elements in each group.
The total number of element within the diagram i.e. the union of Groups A, B, C,
and D, and the total number of elements in the dataset i.e. the universal set are
then calculated. This is used to identify the number of elements that fall outside the union.
*/
proc univariate data =Data_reformatted2 noprint;
var AB A1 B1
%if &venn_diagram > 2 %then %do;
ABC AC BC C1
%end;
output out = data_sum sum = sum_AB sum = sum_A1 sum = sum_B1
%if &venn_diagram > 2 %then %do;
sum = sum_ABC sum = sum_AC sum = sum_BC sum = sum_C1
%end;
run;
/* Counting the number in the universal set */
proc sql noprint;
create table id_count as
select count(id) as count_id
from data_reformatted;
quit;
/* Counting the number inside the union */
data data_sum2;
set data_sum;
totalinside = sum(sum_AB, sum_A1, sum_B1
%if &venn_diagram > 2 %then %do;
,sum_ABC, sum_AC, sum_BC, sum_C1
%end;
);
run;
/*
COUNTING THE ELEMENTS THAT FALL OUTSIDE OF THE UNION
Using the fetch function the values of the total number of elements within the
union and the universal set are fetched from the appropriate datasets and assigned
to a macro-variable. The total number of elements that fall outside the diagram is
then calculated by using %eval to evaluate the arithmetic expression of the number
of elements in the universal set - the number of elements within the union.
*/
/* Calculating the total number of unique ids - so that I can calculate
the number that falls outside of the groups*/
proc sql noprint;
select count_id into: TN
from id_count;
quit;
/* Calculating the total number of values that fall within the groups */
proc sql noprint;
select totalinside into: TI
from data_sum2;
quit;
/* Calculating the total numbers that fall outside all of the groups */
%let TO = %eval(&TN - &TI);
/* Assigning the sums to macro variables */
proc sql noprint;
select sum_A1, sum_B1, sum_AB into :A, :B, :AB
from data_sum2;
quit;
%if &venn_diagram > 2 %then %do;
proc sql noprint;
select sum_C1, sum_AC, sum_BC, sum_ABC into :C, :AC, :BC, :ABC
from data_sum2;
quit;
%end;
data test;
do x = 1 to 100;
y = x;
output;
end;
run;
/*************** 3 WAY VENN DIAGRAMS ***************/
%if &venn_diagram = 3 %then %do;
proc template;
define statgraph Venn3Way;
begingraph / drawspace=datavalue;
/* Plot */
layout overlay / yaxisopts = (display = NONE) xaxisopts = (display = NONE);
scatterplot x=x y=y / markerattrs=(size = 0);
/* Venn Diagram (Circles) */
drawoval x=37 y=40 width=45 height=60 /
display=all fillattrs=(color=red)
transparency=0.75 WIDTHUNIT= Percent HEIGHTUNIT= Percent;
drawoval x=63 y=40 width=45 height=60 /
display=all fillattrs=(color=green)
transparency=0.75 WIDTHUNIT= Percent HEIGHTUNIT= Percent;
drawoval x=50 y=65 width=45 height=60 /
display=all fillattrs=(color=blue)
transparency=0.75 WIDTHUNIT= Percent HEIGHTUNIT= Percent;
/* Numbers */
drawtext "&A" / x=32 y=35 anchor=center;
drawtext "&AB" / x=50 y=30 anchor=center;
drawtext "&B" / x=68 y=35 anchor=center;
drawtext "&ABC" / x=50 y=50 anchor=center;
drawtext "&AC" / x=37 y=55 anchor=center;
drawtext "&BC" / x=63 y=55 anchor=center;
drawtext "&C" / x=50 y=75 anchor=center;
drawtext "Outside Union - &TO" / y=3 x=50 anchor=center width = 30;
/* Labels */
drawtext "&GroupA" / x=30 y=7 anchor=center width = 30;
drawtext "&GroupB" / x=70 y=7 anchor=center width = 30;
drawtext "&GroupC" / x=50 y=98 anchor=center width = 30;
endlayout;
endgraph;
end;
run;
ods graphics on / reset = all border = off width=16cm height=12cm imagefmt = png imagename =
"&desc";
ods listing gpath = "/home/" image_dpi = 200;
proc sgrender data=test template=Venn3Way;
run;
ods listing close;
ods graphics off;
%end;
%mend venn;
%venn (data=mydataset);
Mydataset | % of lipids | Number of observations |
white blood cells (WBC) | 5.20 | 351 |
red blood cells (RBC) | 12.14 | 546 |
adipose tissus (AT) | 7.89 | 962 |
Hi,
I think this paper is better to use:
https://www.pharmasug.org/proceedings/2018/DV/PharmaSUG-2018-DV20.pdf
But, first your dataset needs to be arranged to show which observations are in each of the groups. Please look at the VennData dataset to see how the Dataset should be arranged. But as a quick example, your dataset create a dataset that has 3 variables, such as WBC, RBC and AT. And if one of the records each has WBC, RBC and AT, then all the columns would have the same value.
You seem to have missed this bit of the paper:
data data; call streaminit(123); do i = 1 to 1000; id = i; A = RAND('UNIFORM'); B = RAND('UNIFORM'); C = RAND('UNIFORM'); D = RAND('UNIFORM'); output; end; run;
With the author hardcoding everything to A B C variables it means that you likely need to make a data set where you Rename your existing variables to A B and C to use this macro (moderately poor Macro coding design and definitely poorly documented).
My guess, based on your code is that you may want something like this:
data toplot; set yourdatasetname; rename WBC=A RBC=B AT =C ; run;
With the way the macro uses GroupA, GroupB and GroupC I would suggest a more meaningful description than the variable name unless WBC is very meaningful to you.
Since the macro references a p-value try a couple of values for Cuttoff between 0 and 1. I might try the 0.3 and then 0.7 and see what they do to the appearance of the graph.
I also suggest not attempting to remove any of the code in that macro until you know enough to parse everything. I think that you may have removed some bits.
Hi,
I think this paper is better to use:
https://www.pharmasug.org/proceedings/2018/DV/PharmaSUG-2018-DV20.pdf
But, first your dataset needs to be arranged to show which observations are in each of the groups. Please look at the VennData dataset to see how the Dataset should be arranged. But as a quick example, your dataset create a dataset that has 3 variables, such as WBC, RBC and AT. And if one of the records each has WBC, RBC and AT, then all the columns would have the same value.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.