BookmarkSubscribeRSS Feed
andypandy_swe
Obsidian | Level 7

I've got a derived variable in a dataset called AGE_DECADE defined as length=3 format=3.

This variable is created from a another variable called AGE with this code:

AGE_DECADE=10*int(AGE/10);

 

So

if AGE is between 0 and 9 then AGE_DECADE is set to 0.

if AGE is between 10 and 19 then AGE_DECADE is set to 10

if AGE is between 20 and 29 then AGE_DECADE is set to 20 and so on...

 

If I export this dataset to SPSS using proc export (dbms=sav) and then do a simple frequency count on AGE_DECADE I get two rows with AGE_DECADE=0!

 

In my case I get 14 instances of AGE_DECADE=0 and also 40 instances of AGE_DECADE=0.

I've increased the number of decimals in SPSS but there is no difference between the two that I can see.

A frequency count in SAS results correctly in 54 instances of AGE_DECADE=0.

 

When I check the age variable there are 14 with AGE=0 and the other 40 rows have AGE between 1 and 9.

 

So somehow "10*int(AGE/10)" results in one type of zero if AGE=0 and a different type of zero if AGE=1-9.

 

Could someone explain this to me?

 

By the way, I solved the problem by:

if AGE >= 0 and AGE <= 9 then AGE_DECADE=0;
else if AGE ne . then AGE_DECADE=10*int(d_age/10);

 

This works (I get only one frequency count for AGE_DECADE in SPSS) but I'd really like to know whats going on...

 

7 REPLIES 7
art297
Opal | Level 21

Interesting question! Actually, in this case, SAS is treating them the same, while SPSS is correctly (IMHO) treating zero values differently. Possibly someone from SAS Institute can chime in with an explanation.

 

Run the following and look at the results of the proc freq:

data have;
  input dob date9.;
  if _n_ eq 2 then age=0;
  else age=yrdif(dob,today());
  AGE_DECADE=10*int(AGE/10);
  hage_decade=put(AGE_DECADE,hex16.);
  cards;
4apr2007
.
4apr2008
4apr2009
4apr2010
4apr2011
4apr2012
4apr2013
4apr2014
4apr2015
4apr2016
4apr2017
;
proc freq data=have;
  tables age_decade hage_decade;
run;

Art, CEO, AnalystFinder.com

 

andypandy_swe
Obsidian | Level 7
Curious as to why you would say that SPSS treats the zeroes correctly? I guess as a developer (and not a statistician) I feel it is obvious that the intention of the code is to bin the age into decade containers... Why would a zero not bin into zero when a ten bins into 1 and twenty into 2?

I’ve got a feeling I’m about to get schooled but I really want to know 🙂
art297
Opal | Level 21

As neither a developer, mathematician or statistician, my explanation could be totally wrong, which is why I was hoping that a developer from SAS Institute would have chimed in.

 

@ChrisHemedinger: Know anyone who could address this?

 

However, that said, I have spent numerous hours getting SAS to perform as expected when numeric precision showed its ugly head. Interestingly, in this case, SAS is ignoring the fuzz factor without being told to do so.

 

Are the calculations "precisely" the same, given your formula [i.e.,AGE_DECADE=10*int(AGE/10) ] when age is 0 and age is between 1 and 10. The answer is no if you are considering a number represented in 32 bits.

 

I think you will get the same answers in SAS and SPSS if you use the formula:

AGE_DECADE=fuzz(10*int(AGE/10))

I'm just surprised that SAS ignores the difference in this case when the values are clearly different when viewed in Hex16. format.

 

Art, CEO, AnalystFinder.com

 

ChrisNZ
Tourmaline | Level 20

The first bit, which differs as shown by @art297 , is the sign bit (see http://support.sas.com/kb/31/437.html).

 

So   10*int(AGE/10) returns -0 for some (probably precision) reason. 

 

There are several ways to obtain a +0 result:

 

data HAVE;
  AGE=0;
  AGE_DECADE1=      10*int(AGE/10)      ;
  AGE_DECADE2=round(10*int(AGE/10) )    ;
  AGE_DECADE3=      10*int(AGE/10) -0   ;
  AGE_DECADE4=      10*int(AGE/10) +1-1 ;
  AGE_DECADE5=      10*int(AGE/10) *1   ;
  AGE_DECADE6=      10*int(AGE/10) *2/2 ;
  AGE_DECADE7=abs(  10*int(AGE/10) )    ;
  format _ALL_ binary64.;
run;

yields:

 

AGE_DECADE1 1000000000000000000000000000000000000000000000000000000000000000
AGE_DECADE2 0000000000000000000000000000000000000000000000000000000000000000
AGE_DECADE3 1000000000000000000000000000000000000000000000000000000000000000
AGE_DECADE4 0000000000000000000000000000000000000000000000000000000000000000
AGE_DECADE5 1000000000000000000000000000000000000000000000000000000000000000
AGE_DECADE6 1000000000000000000000000000000000000000000000000000000000000000
AGE_DECADE7 0000000000000000000000000000000000000000000000000000000000000000

 

SAS does not differentiate between +0 and -0 so proc freq or proc compare (see below) group them together.

SPSS treats the 2 values as different.

 

proc compare data=HAVE compare=HAVE(keep=AGE_DECADE1 rename=(AGE_DECADE1=AGE)); run;

NOTE: No unequal values were found. All values compared are exactly equal. 

 

Since SAS does not differentiate ( and in this case gets it wrong: there is no reason the result should be negative here), I would argue that SAS should never store the value -0 in its data sets.

 

 

ChrisHemedinger
Community Manager

Other programming languages, not just SAS, have the concept of -0 (a little mind bending when you think about it).  The representation difference is the sign bit, but SAS (as @ChrisNZ says) treats the two values as equal -- as do most other programming languages.

 

I've written about this here: What's the difference between 0 and -0?

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.
art297
Opal | Level 21

Thanks @ChrisNZ and @ChrisHemedinger!

 

Usually learn something new every day but, today, guess that's -nothing!

 

@andypandy_swe: So, like I said before, use:

AGE_DECADE=fuzz(10*int(AGE/10));

in your SAS code (before exporting the file), then you'll get the same results in both SAS and SPSS.

 

Art, CEO, AnalystFinder.com

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1266 views
  • 4 likes
  • 4 in conversation