Hello,
I need help for a graph with proc sgpanel.
I made a graph with categories but they don't come out in the right order. This is the method variable based on the locpm variable divided into categories "-10 -6", "-6 -2", "-2 2", "2 6" and "6 8". But category -2 2 is before category -6 -2. I searched everywhere, I made sorts but it does not change anything.
I have 1827 rows and 7 variables. Thanks for your help.
/* Création de la variable group*/
data v2myosi.group2;
set V2myosi.fc2;
if logcpm <-6 then group = "-10 -6";
if -6 =< logcpm < -2 then group = "-6 -2";
if -2 =< logcpm < 2 then group = "-2 2";
if 2 =< logcpm < 6 then group = "2 6";
if 6 =< logcpm then group = "6 11";
run;
/*Création de la variable méthode*/
Data V2myosi.group1;
set V2myosi.group;
if total = 3 then methode = "D E L";
if total = 2 and Edge = 1 and Deseq = 1 then methode = "E D";
if total = 2 and Edge = 1 and Limma = 1 then methode = "E L";
if total = 2 and Deseq = 1 and Limma = 1 then methode = "D L";
if total = 1 and Edge = 1 then methode = "E";
if total = 1 and Deseq = 1 then methode = "D";
if total = 1 and Limma = 1 then methode = "L";
run;
title 'Myosite : LogCPM en groupes et par méthode';
proc sgpanel data=V2myosi.group1;
styleattrs datacolors=(lightgray lightred mediumpurple orange cyan green yellow) datacontrastcolors=(black black);
panelby group / layout=columnlattice onepanel HEADERBACKCOLOR=pink HEADERATTRS=GraphLabelText
colheaderpos=bottom rows=1 novarname noborder ;
vbar methode / group=methode stat=sum nostatlabel datalabel ;
colaxis display=none;
rowaxis grid;
run;
Skip this entire step
data v2myosi.group2;
set V2myosi.fc2;
if logcpm <-6 then group = "-10 -6";
if -6 =< logcpm < -2 then group = "-6 -2";
if -2 =< logcpm < 2 then group = "-2 2";
if 2 =< logcpm < 6 then group = "2 6";
if 6 =< logcpm then group = "6 11";
run;
Create a format like this:
Proc format; value logcpmgrp low - < -6 = '-10 -6' -6 -< -2 = '-6 -2' -2 -< 6 = '-2 6' 6 - high = '6 11' ; run;
Use the Logpcm variable instead of "group" (Note: use of keywords as variable names like Group leads to confusion") and assign the format to the variable.
proc sgpanel data=V2myosi.group1; styleattrs datacolors=(lightgray lightred mediumpurple orange cyan green yellow) datacontrastcolors=(black black); panelby logpcm / layout=columnlattice onepanel HEADERBACKCOLOR=pink HEADERATTRS=GraphLabelText colheaderpos=bottom rows=1 novarname noborder ;
format logpcm logpcmgrp.; vbar methode / group=methode stat=sum nostatlabel datalabel ; colaxis display=none; rowaxis grid; run;
You may want to sort your data by Logpcm.
Groups created by formats are honored in analysis, reporting and in most cases graphing procedures (some exceptions with custom date, time and datetime formats).
Your problem is related to the sort order of character values -2 comes before -6 because order is determined in a character by character comparison. They both start with - so the comparison moves to the second character and guess what, 2 is treated as less than 6. Use of the numeric value means the sort uses the underlying value, not the formatted value (some procedures allow use of formatted values explicitly as an option).
Note: The data steps you show and the proc sgpanel code do not align to have the variable named group.
@Nathalie1 wrote:
Hello,
I need help for a graph with proc sgpanel.
I made a graph with categories but they don't come out in the right order. This is the method variable based on the locpm variable divided into categories "-10 -6", "-6 -2", "-2 2", "2 6" and "6 8". But category -2 2 is before category -6 -2. I searched everywhere, I made sorts but it does not change anything.
I have 1827 rows and 7 variables. Thanks for your help.
/* Création de la variable group*/ data v2myosi.group2; set V2myosi.fc2; if logcpm <-6 then group = "-10 -6"; if -6 =< logcpm < -2 then group = "-6 -2"; if -2 =< logcpm < 2 then group = "-2 2"; if 2 =< logcpm < 6 then group = "2 6"; if 6 =< logcpm then group = "6 11"; run; /*Création de la variable méthode*/ Data V2myosi.group1; set V2myosi.group; if total = 3 then methode = "D E L"; if total = 2 and Edge = 1 and Deseq = 1 then methode = "E D"; if total = 2 and Edge = 1 and Limma = 1 then methode = "E L"; if total = 2 and Deseq = 1 and Limma = 1 then methode = "D L"; if total = 1 and Edge = 1 then methode = "E"; if total = 1 and Deseq = 1 then methode = "D"; if total = 1 and Limma = 1 then methode = "L"; run; title 'Myosite : LogCPM en groupes et par méthode'; proc sgpanel data=V2myosi.group1; styleattrs datacolors=(lightgray lightred mediumpurple orange cyan green yellow) datacontrastcolors=(black black); panelby group / layout=columnlattice onepanel HEADERBACKCOLOR=pink HEADERATTRS=GraphLabelText colheaderpos=bottom rows=1 novarname noborder ; vbar methode / group=methode stat=sum nostatlabel datalabel ; colaxis display=none; rowaxis grid; run;
Thank you for your help.
I made what you say, the order is good, but I can't apply the format. Can you help me ?
data v2myo.group3;
set V2myo.group1;
if logcpm <-6 then grp = 1;
if -6 =< logcpm < -2 then grp = 2;
if -2 =< logcpm < 2 then grp = 3;
if 2 =< logcpm < 6 then grp = 4;
if 6 =< logcpm then grp = 5;
run;
libname V2myo 'C:\Users\771\Documents\IMRB\Sclero\Sclero V2 myo 21 fev 2023';
proc format lib=V2myo;
value grp
1 = "-10 -6"
2 = "-6 -2"
3 = "-2 2"
4 = "2 6"
5 = "6 8";
run;
title 'Myopathie : LogCPM en groupes et par méthode';
proc sgpanel data=V2myo.group3 ;
styleattrs datacolors=(lightgray lightred mediumpurple cyan green yellow) datacontrastcolors=(black black);
panelby grp / layout=columnlattice onepanel HEADERBACKCOLOR=pink HEADERATTRS=GraphLabelText
colheaderpos=bottom rows=1 novarname noborder SORT=ASCFORMAT ;
vbar methode / group=methode stat=sum nostatlabel datalabel ;
colaxis display=none;
rowaxis grid;
run;
Skip this entire step
data v2myosi.group2;
set V2myosi.fc2;
if logcpm <-6 then group = "-10 -6";
if -6 =< logcpm < -2 then group = "-6 -2";
if -2 =< logcpm < 2 then group = "-2 2";
if 2 =< logcpm < 6 then group = "2 6";
if 6 =< logcpm then group = "6 11";
run;
Create a format like this:
Proc format; value logcpmgrp low - < -6 = '-10 -6' -6 -< -2 = '-6 -2' -2 -< 6 = '-2 6' 6 - high = '6 11' ; run;
Use the Logpcm variable instead of "group" (Note: use of keywords as variable names like Group leads to confusion") and assign the format to the variable.
proc sgpanel data=V2myosi.group1; styleattrs datacolors=(lightgray lightred mediumpurple orange cyan green yellow) datacontrastcolors=(black black); panelby logpcm / layout=columnlattice onepanel HEADERBACKCOLOR=pink HEADERATTRS=GraphLabelText colheaderpos=bottom rows=1 novarname noborder ;
format logpcm logpcmgrp.; vbar methode / group=methode stat=sum nostatlabel datalabel ; colaxis display=none; rowaxis grid; run;
You may want to sort your data by Logpcm.
Groups created by formats are honored in analysis, reporting and in most cases graphing procedures (some exceptions with custom date, time and datetime formats).
Your problem is related to the sort order of character values -2 comes before -6 because order is determined in a character by character comparison. They both start with - so the comparison moves to the second character and guess what, 2 is treated as less than 6. Use of the numeric value means the sort uses the underlying value, not the formatted value (some procedures allow use of formatted values explicitly as an option).
Note: The data steps you show and the proc sgpanel code do not align to have the variable named group.
@Nathalie1 wrote:
Hello,
I need help for a graph with proc sgpanel.
I made a graph with categories but they don't come out in the right order. This is the method variable based on the locpm variable divided into categories "-10 -6", "-6 -2", "-2 2", "2 6" and "6 8". But category -2 2 is before category -6 -2. I searched everywhere, I made sorts but it does not change anything.
I have 1827 rows and 7 variables. Thanks for your help.
/* Création de la variable group*/ data v2myosi.group2; set V2myosi.fc2; if logcpm <-6 then group = "-10 -6"; if -6 =< logcpm < -2 then group = "-6 -2"; if -2 =< logcpm < 2 then group = "-2 2"; if 2 =< logcpm < 6 then group = "2 6"; if 6 =< logcpm then group = "6 11"; run; /*Création de la variable méthode*/ Data V2myosi.group1; set V2myosi.group; if total = 3 then methode = "D E L"; if total = 2 and Edge = 1 and Deseq = 1 then methode = "E D"; if total = 2 and Edge = 1 and Limma = 1 then methode = "E L"; if total = 2 and Deseq = 1 and Limma = 1 then methode = "D L"; if total = 1 and Edge = 1 then methode = "E"; if total = 1 and Deseq = 1 then methode = "D"; if total = 1 and Limma = 1 then methode = "L"; run; title 'Myosite : LogCPM en groupes et par méthode'; proc sgpanel data=V2myosi.group1; styleattrs datacolors=(lightgray lightred mediumpurple orange cyan green yellow) datacontrastcolors=(black black); panelby group / layout=columnlattice onepanel HEADERBACKCOLOR=pink HEADERATTRS=GraphLabelText colheaderpos=bottom rows=1 novarname noborder ; vbar methode / group=methode stat=sum nostatlabel datalabel ; colaxis display=none; rowaxis grid; run;
Thank you so much for your help. It's perfect. I will take into account all your comments to improve myself.
Sincerely yours,
Nathalie
Formats are a very powerful tool available in SAS.
Consider this scenario: You have a very large data set and uses code similar to what you attempted for the Group variable that takes an hour or more to run. Then your boss comes in and says "I want to see what the difference is when we change the group boundaries to -5 and 5 instead of -6 and 6". If you are adding variables then you have that time penalty to add the variable and have to change the code to use the new variable. However if you use the format approach you can create a new format and run the code just changing the name of the format (and the corresponding Class variable reference if such is used).
Another advantage is suppose you have a different variable (or multiple variables), possible from a different instrument or collection source, that you want to do the same analysis with the same boundaries. The only change to code would be to use the new variable in the Class (if needed) and associate the format with it instead of adding a bunch of variables.
The main drawback with formats like this is that they only apply to single variables. And value that you want to use that relies on two or more variables requires adding variables (mostly).
Another advantage, with some experience, is that you can use a data set of values, whether boundaries such as this problem, or just a list of values and a desired display to create the format. Think something like turning postal code values into geographic regions or branch office names to supervising vice president.
You are quite right. I rarely used formats but you have demonstrated and explained to me that it is very useful and effective in saving time, especially in research. Thanks again for taking the time to help me. I really like SAS because there is a lot of help for people in difficulty. If I don't succeed in doing my analyzes with SAS, I will have to learn R. Unfortunately, this is the tool used in bioinformatics.
Sincerely yours,
Nathalie
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.