BookmarkSubscribeRSS Feed
u_eson
Calcite | Level 5

I have a numeric variable ID which I read as 15. although it consists actually of 5 segments (I1-I5). When I do a PROC SUMMARY NWAY with CLASS ID some observations seem to have up to 275 rows, but if I do PROC SUMMARY NWAY with CLASS I1 I2 I3 I4 I5 the maximum number of rows per "CLASS" is 4 (which is correct). Merging the original data set by ID works well if I use the I1-I5 in SUMMARY and add ID in an ID-statement of PROC SUMMARY. It seems as if PROC SUMMAY use different precision when the variable is used in the CLASS-statement than in the ID-statement. Is this something that is obvious and that I should know/understand? It probably is 😉

/Ulf

5 REPLIES 5
Astounding
PROC Star

It sounds like the concept you need to know/understand is the automatic variable _TYPE_ that is part of the output data set.  Find out how it relates to the CLASS statement, as well as the NWAY option.  That likely answers all your questions.

Good luck.

u_eson
Calcite | Level 5

No, I don't think so, because it seems to be related to the numeric precision. If you try this simple example:

data a;

input @1 identity 15. @1 i1 2. @3 i2 6. @9 i3 4. @13 i4 1. @14 i5 2. @16 code 2.;

datalines;

00110129200720301

00110129200720302

00110129200720303

00110129200720401

00110129200720402

00110129200720403

00110129200720404

;

proc sort data=a; by identity;

proc summary nway data=a; class i1 i2 i3 i4 i5; var code;

output out=ut1 n=number; id identity;

proc print data=ut1;

data b1; merge a ut1; by identity;

proc print data=b1;

proc summary nway data=a; class identity; var code;

output out=ut2 n=number;

proc print data=ut2;

data b2; merge a ut2; by identity;

proc print data=b2;

run;

You will see that proc summary cannot take identity with that length in the class statement, but in the id statement it works OK, since merging on that variable provide the correct match. A bit weird that the precision seem to be different within the same procedure?!

One solution (thanks SASKiwi) seems to be to read identity as $15. But, I still think it is weird 😉

SASKiwi
PROC Star

You might also want to consider whether it is sensible to store a 15-digit classification variable as a number. 15 digits is getting near the limit of the 8 byte numeric precision of SAS, so it could make more sense to store this as a character variable which would make identifying your 5 segments easier.

u_eson
Calcite | Level 5

Thanks, that is indeed a solution to my problem! However, I still think it is a bit weird that proc summary handle the numeric precision differently if it is in the class statement or in the id statement (see another answer for an example).

SteveDenham
Jade | Level 19

Not necessarily weird so much as just unexpected, when you consider what happens behind the scenes with the two statements.  In the ID statement, things are moved as literals and attached.  In the CLASS statement, things are aggregated using the 8 byte precision for the equality to classify an observation.

At least that is how I was taught about it.  Might be wrong though...

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1097 views
  • 3 likes
  • 4 in conversation