01-13-2014 12:16 PM
I was trying to see how many 2 way tables I could get out of PROC SUMMARY and I noticed a performance issue with multi-lable formats and the MLF option. I think my test must be flawed but I can't figure a way to get the same performance without MLF even if the formats don't actually have multi-lables but just use the option as in my example below.
01-14-2014 09:29 AM
What is the question? The second summary appears to use less time and more memory. More memory might be attributed to the MLF option. Less time might just be disk caching of you input data set.
01-14-2014 12:24 PM
It's not disk caching.
As best I can tell the performance different is related to the use of MLF. I just think it is interesting and wondered if anyone had noticed it before or maybe someone from SAS had an explanation. It does take a lot of variables before the performance different becomes noticeable.
01-14-2014 03:33 PM
I didn't comment because you're not using MULTILABEL and MLF the way they are intended. Your user-defined format does not have any "overlapping" values...so I'm not sure what your test shows. The usual way to specify the MLF would be if you had 5 years in the data (1998 - 2002) and you wanted the year categories to be every year "by itself" and then the 1998-1999 as one category, and the 2000-2002 years as a separate category. So essentially, you are "double counting" each year.:
1998 and 1999
2000 through 2002
You might have the same results, you might not. As we say in the Advanced Programming class -- Your Mileage May Vary.
01-14-2014 03:58 PM
I removed the "true" MULTILABEL from my example to make the two steps equivalent in the ouput they produce, same number of observations from the same number of variables and crossings.
PROC FORMAT and PROC SUMMARY don't seem to care if you used MULTILABEL as it was intended or not.