BookmarkSubscribeRSS Feed
Deryl
Calcite | Level 5

I'm new on a project and inherited some code to work on that uses proc shewhart, which I've never used. During testing, I realized that this procedure produces different output each time I run it even without changing the input dataset. Here is the code:

 

proc shewhart data=fac;
  by mpin;
  xschart sum_amt_paid*yearmo/nochart outtable=out1;
  id sum_claims;
run;

 

The dataset fac has 69,090,035 observations. Each time I run that step, without changing fac at all, I get different counts ( always around 63 million obs) in dataset out1.

 

Is this expected behavior? If not, any thoughts on what might be the cause?

10 REPLIES 10
PaigeMiller
Diamond | Level 26

@Deryl wrote:

I'm new on a project and inherited some code to work on that uses proc shewhart, which I've never used. During testing, I realized that this procedure produces different output each time I run it even without changing the input dataset. Here is the code:

 

proc shewhart data=fac;
  by mpin;
  xschart sum_amt_paid*yearmo/nochart outtable=out1;
  id sum_claims;
run;

 

The dataset fac has 69,090,035 observations. Each time I run that step, without changing fac at all, I get different counts ( always around 63 million obs) in dataset out1.

 

Is this expected behavior? If not, any thoughts on what might be the cause?


You'd need to show us the different outputs, and convince us (withouth trying to send us the 69090035 observations) that the input data hasn't changed and yet you get different results. I think everyone here would say that's not possible.

--
Paige Miller
Deryl
Calcite | Level 5

Should the number of rows in the input dataset equal the number in the output dataset? If so, I can run the code showing counts before and after and post it. Otherwise, I'm not sure what I could do here. Futhermore, I'm sure that your assumption of user error is correct, I'm just having trouble finding my way to what it is I'm doing wrong.

PaigeMiller
Diamond | Level 26

@Deryl wrote:

Should the number of rows in the input dataset equal the number in the output dataset?


NO, the OUTTABLE data set will not have the same number of rows as the input data set, unless as I said, there is only one observation in each level of YEARMO.

 

I'm just having trouble finding my way to what it is I'm doing wrong.

 

It's not clear what you want PROC SHEWHART to do. How does this relate to your original problem?

--
Paige Miller
Deryl
Calcite | Level 5

Shewhart is being used to identify lines were _exlim_='UPPER'.

 

This relates to my original problem in that I need a way to prove my observation. I asked about counts because I thought that would be an easy way, but it appears that I was mistaken there.

Deryl
Calcite | Level 5

I'm sorry, I forgot to include these warnings from the log:

 

WARNING: No variability was encountered in the data for process variable SUM_AMT_PAID.
WARNING: Control limits cannot be established for SUM_AMT_PAID.
WARNING: A control chart has not been completed for SUM_AMT_PAID due to insufficient subgroups with at least two observations.

 

There are 36 total wranings, each is one of the above three. I'm assuming that some groupings of my "by" values have insufficient data. Is this maybe the problem?

PaigeMiller
Diamond | Level 26

This would happen if there was only one observation for each level of variable YEARMO.

 

It doesn't explain your original symptoms (or maybe it does, I guess I have never been in this situation).

--
Paige Miller
Quentin
Super User

No, those warnings wouldn't explain getting different output datasets when you re-run the PROC SHEWHART. 

 

PROC SHEWHART isn't doing any random sampling or anything.  I can't think of any reason you would get different results from running it twice.

 

I would try saving your fac dataset as a perm dataset, then run (in a fresh session) something like:

 

proc shewhart data=lib.fac;
  by mpin;
  xschart sum_amt_paid*yearmo/nochart outtable=out1;
  id sum_claims;
run;

proc shewhart data=lib.fac;
  by mpin;
  xschart sum_amt_paid*yearmo/nochart outtable=out2;
  id sum_claims;
run;

proc compare base=out1 compare=out2 error;
run;

If OUT1 and OUT2 don't compare equal, I'd be very disappointed.  And I'd suggest sending it in to tech support.  You could also grab a 1% sample of the data and repeat the test, just for fun.

 

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Rick_SAS
SAS Super FREQ

So that I can simulate this situation, can you please specify:

1. How many BY groups in MPIN?

2. What is the average number of values of YEARMO in a typical BY group?

3. What is the average sample size for each value of YEARMO?

Rough estimates are fine. I assume that the product of these three values is approximately the 69M = number of observations.

Deryl
Calcite | Level 5

Rick,

 

I will get this information if still needed, but I played around with the data over the weekend and maybe I've solved my issue. I am hoping to get confirmation that things are as they should be. Let me list my findings.

 

1. WHen I first posted this issue, I was running two EG session. My understanding was that they had two different work spaces, but maybe that was the cause of my unreplicating results. When I have only one EG sessions, rerunning the shewhart gives the same results.

2. The results in (1) are missing some values of mpin. Is that normal?

3. If close and reopen EG, running the DB query to get the data and then rerunning shewhart gives me different results. The database is static. The counts for the DB query match each time.

4. (3) led me to believe my results are dependent on the order the data is in. Previously I only had the data sorted by mpin. When I sort the data by mpin and yearmo I get the same results every time. It doesn't matter if I repull the data from the database or not. I can close and reopen EG; still the same results. It also now contains all of the values for mpin. However, the output is around 3M rows when previously they were around 63M.

 

When I was reading about shewhart I did not think I needed to sort by yearmo. Was that mistake by me?

Quentin
Super User
  1. Two EG sessions should have two different work spaces.  I don't think this would explain different results from two different Shewhart runs on the same data.
  2. Missing values of MPIN does not sound normal, unless perhaps if there are missing values of MPIN in the source data?
  3. I suppose it's possible each time you re-query the database, you may be getting records in a different order.  SQL doesn't care about order, if you don't specify an order.
  4. If you don't sort by YEARMO, what does your log look like?  I ran below code:
proc sort data=sashelp.prdsale out=mysale;
  by country;
run;

proc shewhart data=mysale;
  by country;
  xschart actual*year/nochart outtable=myout;
  id prodtype;
run;

And I get errors in the log because it is not sorted by year:

 

ERROR: The values of the subgroup variable YEAR are not sorted in increasing order in the
       data set MYSALE.
NOTE: Recommended practice is to sort the data by the subgroup variable (within BY groups
      if any) with PROC SORT.
BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1888 views
  • 4 likes
  • 4 in conversation