11-17-2014 07:17 AM
I have a numeric variable ID which I read as 15. although it consists actually of 5 segments (I1-I5). When I do a PROC SUMMARY NWAY with CLASS ID some observations seem to have up to 275 rows, but if I do PROC SUMMARY NWAY with CLASS I1 I2 I3 I4 I5 the maximum number of rows per "CLASS" is 4 (which is correct). Merging the original data set by ID works well if I use the I1-I5 in SUMMARY and add ID in an ID-statement of PROC SUMMARY. It seems as if PROC SUMMAY use different precision when the variable is used in the CLASS-statement than in the ID-statement. Is this something that is obvious and that I should know/understand? It probably is ;-)
11-17-2014 10:36 AM
It sounds like the concept you need to know/understand is the automatic variable _TYPE_ that is part of the output data set. Find out how it relates to the CLASS statement, as well as the NWAY option. That likely answers all your questions.
11-18-2014 01:41 AM
No, I don't think so, because it seems to be related to the numeric precision. If you try this simple example:
input @1 identity 15. @1 i1 2. @3 i2 6. @9 i3 4. @13 i4 1. @14 i5 2. @16 code 2.;
proc sort data=a; by identity;
proc summary nway data=a; class i1 i2 i3 i4 i5; var code;
output out=ut1 n=number; id identity;
proc print data=ut1;
data b1; merge a ut1; by identity;
proc print data=b1;
proc summary nway data=a; class identity; var code;
output out=ut2 n=number;
proc print data=ut2;
data b2; merge a ut2; by identity;
proc print data=b2;
You will see that proc summary cannot take identity with that length in the class statement, but in the id statement it works OK, since merging on that variable provide the correct match. A bit weird that the precision seem to be different within the same procedure?!
One solution (thanks SASKiwi) seems to be to read identity as $15. But, I still think it is weird ;-)
11-17-2014 01:29 PM
You might also want to consider whether it is sensible to store a 15-digit classification variable as a number. 15 digits is getting near the limit of the 8 byte numeric precision of SAS, so it could make more sense to store this as a character variable which would make identifying your 5 segments easier.
11-18-2014 01:42 AM
Thanks, that is indeed a solution to my problem! However, I still think it is a bit weird that proc summary handle the numeric precision differently if it is in the class statement or in the id statement (see another answer for an example).
11-18-2014 09:33 AM
Not necessarily weird so much as just unexpected, when you consider what happens behind the scenes with the two statements. In the ID statement, things are moved as literals and attached. In the CLASS statement, things are aggregated using the 8 byte precision for the equality to classify an observation.
At least that is how I was taught about it. Might be wrong though...