BookmarkSubscribeRSS Feed
JB1_DK
Fluorite | Level 6

I have data of spreads in two different Rating classes (AA and AAA). I want to test the spreads for normality in each Rating class. I therefore produce this panel using Proc Univariate:

JørgenBoysen_0-1612783555233.png

The problem is that it seems that the line through the qq-plot is the same for all ratings, disregarding the class. 

 

Code used:

proc univariate data=testData;
var Spread;
Class Rating;
qqplot Spread/square normal(MU=EST SIGMA=EST );

run;

 

If I use BY instead of Class, I get individually fitted qq-plots. But that requires sorting and does not provide a panel layout. It should work for Class as well, and display properly in the panel layout.

 

Just for reference, here is the BY output for the AAA rating, clearly not matching the AAA output in the panel layout:

JørgenBoysen_1-1612783962409.png

 

13 REPLIES 13
RichardDeVen
Barite | Level 11
Can you add the 'testData' to the post ?
JB1_DK
Fluorite | Level 6

Yes, these are the observations:

 

data testdata;
input rating :$3. Spread :16.8;
datalines;
AA -16.233096
AA -12.366438
AA -6.320539
AA -3.732885
AA 7.516216
AA -4.689121
AA -14.602099
AA 2.9222857069
AA 3.1682522766
AA -14.341467
AA -10.905014
AA 10.171641418
AA -9.329653
AAA -28.50975333
AAA -28.534444
AAA -28.932457
AAA -28.760108
AAA -28.667521
AAA -28.935067
AAA -29.9774
AAA -28.891263
AAA -28.324166
AAA -26.943601
AAA -3.785452
AAA -26.157575
AAA -19.440571
AAA -17.983667
AAA -17.165252
AAA -6.671511
AAA -7.015316
AAA -8.647729
AAA 8.857968
AAA -13.716122
AAA -6.366132
AAA -6.11798
AAA -15.030489
run;
Ksharp
Super User

No. I don't think so . Both plot have same scale and share the same slope/line .

If you use OVERLAY option .

 

proc univariate data=testData;
var Spread;
Class Rating;
qqplot Spread/overlay square normal(MU=EST SIGMA=EST );

run;

I think there must have some option to adjust this line.

 

@Rick_SAS  may know this .

JB1_DK
Fluorite | Level 6

"Both plot have same scale and share the same slope/line"

 

That is the exact problem - they should not have the same line. They are two separate samples, separated by Rating as per the Class statement. BY processing does the correct thing, which Class should also be able to. I really hope there is some syntax I have not come across that can solve this. Overlay makes no difference

RichardDeVen
Barite | Level 11

Yes, I would report this as a bug.  The normal plot shown in the CLASS based graphs is plotting the Normal Line from the first panel (first CLASS value of AA) in all the panels.  Other than that, the BY and CLASS based analysis results are identical.

BuckyRansdell
SAS Employee

This is a known issue.  In a comparative Q-Q plot (requested with a CLASS statement) the quantiles of the plotted points and the reference line in each cell are computed using parameters of the distribution fitted to the data in the key cell, which by default is the cell in row 1 and column 1.  When you're using a normal distribution the quantiles are unaffected by this, but that is not the case for distributions with shape parameters.

 

Currently you do need to use a BY statement to produce independently-fitted Q-Q plots.

JB1_DK
Fluorite | Level 6

Hi Bucky,

 

Thanks for acknowledging that there is a bug.

 

The BY workaround does not offer what the panel layout can provide. The above was just a toy example, if I want to inspect 20+ groups of data for normality, the panel plots would at a glance tell me which ones to focus on and what makes them fail the normality test (outliers, tails etc). I can code something myself, but this would be very convenient to have Univariate provide

 

 

RichardDeVen
Barite | Level 11

Until the resolving hot fix is issued, you can SGPANEL your own QQ plot by computing the coordinates to be plotted

 

Example

  • Proc RANK - Compute BLOM normal quintiles
  • Proc MEANS - Compute mean and std
  • DATA Step - Use QUANTILE, std, and mean to compute normal reference line end points at .25 .75 quartile
    • Needs some more thinking on how to extend line to plot edges
  • DATA Step - Merge quintiles with reference line data
  • SGPANEL - Output 'tight' QQ plots for quick review

Consider COVID-19 testing data available from New York State department of health.  The number of tests performed is being plotted by county.  The data is reduced to every 31st day within county in order to have a smaller data set that will run faster in this demonstration that only deals with counties whose name starts with A or B (again, less data === faster output).

 

Fetch the data

Spoiler
* https://data.ny.gov/browse?tags=covid-19

* New York State Statewide COVID-19 Testing;

filename testing temp;
filename headers temp;

%if not %sysfunc(exist(work.testing,data)) %then %do;
  proc http 
    url = 'https://health.data.ny.gov/api/views/xdss-u53e/rows.csv?accessType=DOWNLOAD&api_foundry=true'
    method = "get"
    out = testing
    headerout = headers
  ;
  run;

  proc import datafile=testing dbms=csv replace out=work.testing ;
    guessingrows=all;
  run;
%end;

data testing_31;
  set testing;
  by county;
  if first.county then seq=1; else seq+1;
  if mod(seq,31) = 0;
run;

Plots only using UNIVARIATE and BY.  Only 1 panel per by value;

proc univariate noprint data=testing_31;
  var Total_Number_of_Tests_Performed;
  *Class county;
  by county;
  qqplot Total_Number_of_Tests_Performed / square normal(MU=EST SIGMA=EST);
  where county < 'C';
  output out=unistats mean=mean std=std;
run;

 

Compute the coordinates for the per county QQ plots and plot them with SGPANEL and SCATTER and SERIES statements.

proc rank 
  data=testing_31
  normal=BLOM
  out=qq(keep=county Total_Number_of_Tests_Performed nq)
;
  by county;
  var Total_Number_of_Tests_Performed ;
  ranks nq;
run;

proc means nway noprint data=testing_31;
  class county;
  var Total_Number_of_Tests_Performed ;
  output out=line mean=mean std=std;
run;

data refline;
  set line;
  
  xn25 = quantile('normal', 0.25);
  xn75 = quantile('normal', 0.75);

  yn25 = xn25 * std + mean;
  yn75 = xn75 * std + mean;

  x = xn25; y = yn25; output;
  x = xn75; y = yn75; output;

  keep county x y;
run;

data plot;
  merge qq refline;
  by county;
  keep county nq Total_Number_of_Tests_Performed x y ;

  if first.county then seq=1; else seq+1;
  if seq > 2 then call missing (x, y);
run;

proc sgpanel data=plot;
  panelby county / columns=3 rows=3;
  scatter x=nq y=Total_Number_of_Tests_Performed;
  series  x=x y=y;

  where county < 'C';
run;

 

SGPANEL output

RichardADeVenezia_0-1612993572814.png

 

JB1_DK
Fluorite | Level 6

Thanks @RichardDeVen , that's a great approach for this.

 

Proc Rank is a real powertool. And I like how your SGPANEL produces more of a grid instead of the vectorized layout from Univariate

 

Cheers,

Jørgen

RichardDeVen
Barite | Level 11

This is a possible problem when you have a large number of groups with a wide variation in group-wise values being QQ'd.  SGPANEL will produce images that have uniform axes, and thus some groups could be human viewed as squashed or stretched to an extent that there is really no discernable information.  Contrast that with the UNIVARIATE BY/CLASS PLOTs utilize the full plot area for it's graph output.

 

Another approach, which I have not explored yet, is presenting the UNIVARIATE plots (only the plots) in an ODS lattice LAYOUT which would thus grid-ify full plot area graphs.

JB1_DK
Fluorite | Level 6

@RichardDeVen Hello, sorry for the ping but I just took a look at the code for fetching the testing data. I noticed that you used a %IF/%DO/%END block without running it inside a %MACRO. And it works ?! Did they change something so that you can use macro DO/END blocks in programs without wrapping in a macro?

 

Cheers,

JB

yabwon
Onyx | Level 15

%IF-%THEN-%ELSE in "open code" was added a few years  ago, but it has some limitations: it need %do-%end even for a single statement, and it cannot be nested (%IF inside other %IF)

 

Doc: https://go.documentation.sas.com/doc/en/pgmmvacdc/9.4/mcrolref/n18fij8dqsue9pn1lp8436e5mvb7.htm

yabwon_0-1714560285315.png

Blog about it from 2018:

https://blogs.sas.com/content/sasdummy/2018/07/05/if-then-else-sas-programs/

 

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



JB1_DK
Fluorite | Level 6

Thank you @yabwon , appreciate the links and info

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1495 views
  • 5 likes
  • 5 in conversation