Hi All,
I'm trying to subset a dataset by using where clause on a variable, v_h, with the following format.
v_h : type = Num, Len = 8, Format = 4.1, Informat = 4.1
When using proc freq on v_h, there are records of v_h = 1.4, but when using where clause to filter the dataset by v_h = 1.4, there are 0 records
/*returned 0 records*/
data test;
set have(where=(v_h=1.4));
run;
data test;
set have;
if v_h = 1.4;
run;
I also received a message :
"NOTE: Data file have.DATA is in a format that is native to another host, or the file encoding
does not match the session encoding. Cross Environment Data Access will be used, which might require
additional CPU resources and might reduce performance."
Could anyone help me to figure out what went wrong?
Thanks!
Numeric precision, some decimal values don't always store cleanly in a binary form could be an issue and possibly compounded by the CEDA.
Try using either
if round(v_h,0.1) = 1.4;
or an equivalent Where.
And sometimes you may want to use something like: If put(var, formatname.) = "formatted value" as some formats can do some pretty interesting things to specific values.
Note: you do need to consider Formated value versus actual sometimes, especially when looking at Proc Freq values to select things.
Consider:
data example; x=1.423; format x 4.1; run; proc freq data=example; run;
The output shows "1.4" in the Proc Freq output but the value is NOT 1.4.
Numeric precision, some decimal values don't always store cleanly in a binary form could be an issue and possibly compounded by the CEDA.
Try using either
if round(v_h,0.1) = 1.4;
or an equivalent Where.
And sometimes you may want to use something like: If put(var, formatname.) = "formatted value" as some formats can do some pretty interesting things to specific values.
Note: you do need to consider Formated value versus actual sometimes, especially when looking at Proc Freq values to select things.
Consider:
data example; x=1.423; format x 4.1; run; proc freq data=example; run;
The output shows "1.4" in the Proc Freq output but the value is NOT 1.4.
Hi @lydiawawa,
Just to add: Even without an explicit format (such as 4.1 in your case) PROC FREQ would display many different numeric values as "1.4" due to rounding:
data test;
do i=1 to 450362;
v_h=input('3FF66666666'||put(i+194248,hex5.), hex16.);
output;
end;
run;
proc freq data=test;
tables v_h;
run;
Result:
Cumulative Cumulative v_h Frequency Percent Frequency Percent ----------------------------------------------------------------- 1.3999999999 1 0.00 1 0.00 1.4 450360 100.00 450361 100.00 1.4000000001 1 0.00 450362 100.00
So, 450360 different values (under Windows) are counted as 1.4, only one of which satisfies the condition v_h=1.4. [Edit: Thanks to @Watts's explanation we could have calculated that number of different values in advance: The real numbers whose values rounded to 10 decimals equal 1.4 form an interval of length 1E-10. The least significant bit in the internal binary floating-point representation of these numbers has a place value of 2**-52. And finally round(1E-10/2**-52)=450360.]
Some of those "different" values can easily result from computations involving decimal fractions:
601 data bad; 602 v_h=0.14*10; 603 put v_h=; 604 if v_h=1.4; 605 run; v_h=1.4 NOTE: The data set WORK.BAD has 0 observations and 1 variables.
Again, zero observations, in spite of the implicitly rounded value 1.4 from the PUT statement. A slight adjustment like
if round(v_h, 1e-10)=1.4;
would have avoided the error.
Yes, PROC FREQ groups observations into levels by using the formatted values. By default (if a format isn't specified), the format of a numeric variable is BEST12.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.