BookmarkSubscribeRSS Feed
Recep
Quartz | Level 8

Hello everyone,

I have been using SAS number of years now. I would not consider myself an expert but I know a lot of basic stuff and the first time since I started using SAS I faced this problem. I imported a .csv file into SAS and saved it as a SAS dataset. I am trying to find a particular value within a numeric field in that dataset and in the menu bar I go to the Data and select "Where" and type in the value I'm looking for (it's a7-digit integer) and I'm getting no row where that field is equal to the value I'm trying to find. Under normal circumstances this would not raise any flag for me thinking that that value should not be in the dataset but I know that value for that field is in the dataset. In fact if I go to the dataset and look for it row by row I can see that value is there. I'm perplexed. Anybody can think of any reason why?

I do the same search with the data function below:

data test1;
set dad_health_geocoded;

if rowid_dad=7356228;

run;

 

And here is the log:

There were 114295 observations read from the data set WORK.DAD_HEALTH_GEOCODED.
NOTE: The data set WORK.TEST1 has 0 observations and 19 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

 

I would appreciate any direction/help!

 

Regards,

 

Recep

4 REPLIES 4
Patrick
Opal | Level 21

May be a precision issue with the data grid not showing you the full value.

What happens if you round the variable to an integer value?

data test1;
  set dad_health_geocoded;
  if round(rowid_dad,1)=7356228;
run;

 

Here how things look like for me using EG8.2

Capture.JPG 

Tom
Super User Tom
Super User

Doesn't the GUI tool you are using to look at the data show you the observation number? Let's assume it is observation number 12,345.

 

data test1;
  set dad_health_geocoded firstobs=12345 obs=12345;
run;
FreelanceReinh
Jade | Level 19

@Recep wrote:

I am trying to find a particular value within a numeric field in that dataset and in the menu bar I go to the Data and select "Where" and type in the value I'm looking for (it's a7-digit integer) and I'm getting no row (...). In fact if I go to the dataset and look for it row by row I can see that value is there.

Hello @Recep,

 

Note that what you see are always formatted numeric values, whereas the computer "sees" only the internal binary representations of the numeric values, in particular when it applies WHERE or IF conditions. Unfortunately (for us humans), these two are different except in the rare cases when a format such as BINARY64. or HEX16. is used which reveals the internal binary representation.

 

So, one potential issue is that there's a format associated with variable rowid_dad in the dataset, for example 8., which would round the number to an integer if the original value had decimal places (like 7356227.8 or 7356228.4). As you certainly know, you can check this in PROC CONTENTS output. But the formatting could also be due to some default format that is tacitly applied when you "look into" a dataset, be it in the data grid or with PROC PRINT, etc. As @Patrick has pointed out, small deviations from the exact integer value could be hidden by the default format. A typical reason for such small deviations are rounding errors from calculations.

 

In fact, under Windows or Unix there are six different numeric values x (i.e. internal binary representations) which are displayed in virtually all common numeric formats (e.g. BEST12., BEST16., BEST32., E32., 16.8, 32.24, etc.) as if they were equal to 7356228, but only one of them does satisfy the condition x=7356228. It is the third of the six values shown below (highlighted in green).

 

data test;
input x hex16.;
b32=put(x, best32.);
b16=put(x, best16.);
b12=put(x, best12.);
e32=put(x, e32.);
w32=put(x, 32.24);
w16=put(x, 16.8);
bin=put(x, binary64.);
hex=put(x, hex16.);
cards;
415C0FD0FFFFFFFE
415C0FD0FFFFFFFF
415C0FD100000000
415C0FD100000001
415C0FD100000002
415C0FD100000003
;

proc print data=test;
run;

 

Output:

Obs      x         b32        b16        b12                    e32

 1    7356228    7356228    7356228    7356228    7.3562280000000000000000000E+06
 2    7356228    7356228    7356228    7356228    7.3562280000000000000000000E+06
 3    7356228    7356228    7356228    7356228    7.3562280000000000000000000E+06
 4    7356228    7356228    7356228    7356228    7.3562280000000000000000000E+06
 5    7356228    7356228    7356228    7356228    7.3562280000000000000000000E+06
 6    7356228    7356228    7356228    7356228    7.3562280000000000000000000E+06

Obs                 w32                         w16

 1    7356228.000000000000000000000000    7356228.00000000
 2    7356228.000000000000000000000000    7356228.00000000
 3    7356228.000000000000000000000000    7356228.00000000
 4    7356228.000000000000000000000000    7356228.00000000
 5    7356228.000000000000000000000000    7356228.00000000
 6    7356228.000000000000000000000000    7356228.00000000

Obs                                 bin                                         hex

 1    0100000101011100000011111101000011111111111111111111111111111110    415C0FD0FFFFFFFE
 2    0100000101011100000011111101000011111111111111111111111111111111    415C0FD0FFFFFFFF
 3    0100000101011100000011111101000100000000000000000000000000000000    415C0FD100000000
 4    0100000101011100000011111101000100000000000000000000000000000001    415C0FD100000001
 5    0100000101011100000011111101000100000000000000000000000000000010    415C0FD100000002
 6    0100000101011100000011111101000100000000000000000000000000000011    415C0FD100000003

 

To find out the true value of rowid_dad, display its internal representation using the HEX16. or BINARY64. format. Use the observation number (as @Tom has mentioned) or other variables to identify the record containing the value in question.

 

Example:

proc print data=have(firstobs=12345 obs=12345);
format rowid_dad hex16.;
var rowid_dad;
run;

Compare the result to the "standard representation" of the integer, i.e. 415C0FD100000000 (in HEX16. format) in the case of 7356228, and you will see the difference. Other formats, e.g. BEST16. (or even the default format obtained with format rowid_dad; in the PROC PRINT step), will be useful if there's a "larger" deviation from 7356228 like 7356228.3.

 

Once you know the true value, you can adapt the WHERE or IF condition accordingly. The ROUND function (suggested by @Patrick) is an important tool for this purpose in many cases.

 

Recep
Quartz | Level 8

Thanks a lot for all the responses! I'll look into them today and post what I find based on the recommendations. Please follow it.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2009 views
  • 0 likes
  • 4 in conversation