I have a numeric variable ID which I read as 15. although it consists actually of 5 segments (I1-I5). When I do a PROC SUMMARY NWAY with CLASS ID some observations seem to have up to 275 rows, but if I do PROC SUMMARY NWAY with CLASS I1 I2 I3 I4 I5 the maximum number of rows per "CLASS" is 4 (which is correct). Merging the original data set by ID works well if I use the I1-I5 in SUMMARY and add ID in an ID-statement of PROC SUMMARY. It seems as if PROC SUMMAY use different precision when the variable is used in the CLASS-statement than in the ID-statement. Is this something that is obvious and that I should know/understand? It probably is 😉
/Ulf
It sounds like the concept you need to know/understand is the automatic variable _TYPE_ that is part of the output data set. Find out how it relates to the CLASS statement, as well as the NWAY option. That likely answers all your questions.
Good luck.
No, I don't think so, because it seems to be related to the numeric precision. If you try this simple example:
data a;
input @1 identity 15. @1 i1 2. @3 i2 6. @9 i3 4. @13 i4 1. @14 i5 2. @16 code 2.;
datalines;
00110129200720301
00110129200720302
00110129200720303
00110129200720401
00110129200720402
00110129200720403
00110129200720404
;
proc sort data=a; by identity;
proc summary nway data=a; class i1 i2 i3 i4 i5; var code;
output out=ut1 n=number; id identity;
proc print data=ut1;
data b1; merge a ut1; by identity;
proc print data=b1;
proc summary nway data=a; class identity; var code;
output out=ut2 n=number;
proc print data=ut2;
data b2; merge a ut2; by identity;
proc print data=b2;
run;
You will see that proc summary cannot take identity with that length in the class statement, but in the id statement it works OK, since merging on that variable provide the correct match. A bit weird that the precision seem to be different within the same procedure?!
One solution (thanks SASKiwi) seems to be to read identity as $15. But, I still think it is weird 😉
You might also want to consider whether it is sensible to store a 15-digit classification variable as a number. 15 digits is getting near the limit of the 8 byte numeric precision of SAS, so it could make more sense to store this as a character variable which would make identifying your 5 segments easier.
Thanks, that is indeed a solution to my problem! However, I still think it is a bit weird that proc summary handle the numeric precision differently if it is in the class statement or in the id statement (see another answer for an example).
Not necessarily weird so much as just unexpected, when you consider what happens behind the scenes with the two statements. In the ID statement, things are moved as literals and attached. In the CLASS statement, things are aggregated using the 8 byte precision for the equality to classify an observation.
At least that is how I was taught about it. Might be wrong though...
Steve Denham
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.