BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lydiawawa
Lapis Lazuli | Level 10

Hi All,

 

I'm trying to subset a dataset by using where clause on a variable, v_h, with the following format.

 

v_h :  type = Num, Len = 8, Format = 4.1, Informat = 4.1

 

When using proc freq on v_h, there are records of v_h = 1.4, but when using where clause to filter the dataset by v_h = 1.4, there are 0 records

/*returned 0 records*/ 
data test; set have(where=(v_h=1.4)); run; data test; set have; if v_h = 1.4; run;

I also received a message :

"NOTE: Data file have.DATA is in a format that is native to another host, or the file encoding
does not match the session encoding. Cross Environment Data Access will be used, which might require
additional CPU resources and might reduce performance."

 

Could anyone help me to figure out what went wrong?

 

Thanks!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Numeric precision, some decimal values don't always store cleanly in a binary form could be an issue and possibly compounded by the CEDA.

 

Try using either

 

if round(v_h,0.1) = 1.4;

or an equivalent Where.

And sometimes you may want to use something like: If put(var, formatname.) = "formatted value" as some formats can do some pretty interesting things to specific values.

 

Note: you do need to consider Formated value versus actual sometimes, especially when looking at Proc Freq values to select things.

Consider:

data example;
   x=1.423;
   format x 4.1;
run;
proc freq data=example;
run;

The output shows "1.4" in the Proc Freq output but the value is NOT 1.4.

View solution in original post

4 REPLIES 4
ballardw
Super User

Numeric precision, some decimal values don't always store cleanly in a binary form could be an issue and possibly compounded by the CEDA.

 

Try using either

 

if round(v_h,0.1) = 1.4;

or an equivalent Where.

And sometimes you may want to use something like: If put(var, formatname.) = "formatted value" as some formats can do some pretty interesting things to specific values.

 

Note: you do need to consider Formated value versus actual sometimes, especially when looking at Proc Freq values to select things.

Consider:

data example;
   x=1.423;
   format x 4.1;
run;
proc freq data=example;
run;

The output shows "1.4" in the Proc Freq output but the value is NOT 1.4.

FreelanceReinh
Jade | Level 19

Hi @lydiawawa,

 

Just to add: Even without an explicit format (such as 4.1 in your case) PROC FREQ would display many different numeric values as "1.4" due to rounding:

data test;
do i=1 to 450362;
  v_h=input('3FF66666666'||put(i+194248,hex5.), hex16.);
  output;
end;
run;

proc freq data=test;
tables v_h;
run;

Result:

                                         Cumulative    Cumulative
         v_h    Frequency     Percent     Frequency      Percent
-----------------------------------------------------------------
1.3999999999           1        0.00             1         0.00
         1.4      450360      100.00        450361       100.00
1.4000000001           1        0.00        450362       100.00

So, 450360 different values (under Windows) are counted as 1.4, only one of which satisfies the condition v_h=1.4. [Edit: Thanks to @Watts's explanation we could have calculated that number of different values in advance: The real numbers whose values rounded to 10 decimals equal 1.4 form an interval of length 1E-10. The least significant bit in the internal binary floating-point representation of these numbers has a place value of 2**-52. And finally round(1E-10/2**-52)=450360.]

 

Some of those "different" values can easily result from computations involving decimal fractions:

601  data bad;
602  v_h=0.14*10;
603  put v_h=;
604  if v_h=1.4;
605  run;

v_h=1.4
NOTE: The data set WORK.BAD has 0 observations and 1 variables.

Again, zero observations, in spite of the implicitly rounded value 1.4 from the PUT statement. A slight adjustment like

if round(v_h, 1e-10)=1.4;

would have avoided the error.

Watts
SAS Employee

Yes, PROC FREQ groups observations into levels by using the formatted values. By default (if a format isn't specified), the format of a numeric variable is BEST12. 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 850 views
  • 5 likes
  • 4 in conversation