BookmarkSubscribeRSS Feed

SAS Tip: NOBS option limitations

Started ‎05-10-2019 by
Modified ‎05-15-2019 by
Views 24,116

The documentation for the SET statement tells us that NOBS= option initializes a variable "whose value is usually the total number of observations in the input data set or data sets". The inclusion of the word "usually" is notable. The text goes on state:

  • "The number of observations includes those observations that are marked for deletion but are not yet deleted."
  • "For certain SAS views, SAS cannot determine the number of observations. In these cases, SAS sets the value of the NOBS= variable to the largest positive integer value that is available in your operating environment."

In fact there are other circumstances which completely prevent NOBS from working or cause it to produce possibly unexpected results.

Here are two native SAS data files which will be used to illustrate the issues.

data elevens twelves ;
set sashelp.class ;
keep name age sex ;
select (age) ; 
   when (11) output elevens ;
   when (12) output twelves ;
   otherwise ;
   end ;
run ; 

ELEVENS:

Obs     Name     Sex    Age

 1     Joyce      F      11
 2     Thomas     M      11

TWELVES:

Obs     Name     Sex    Age

 1     James      M      12
 2     Jane       F      12
 3     John       M      12
 4     Louise     F      12
 5     Robert     M      12
 

Well-Behaved Operation of NOBS

Here is a trouble-free exercise of the NOBS option:

data _null_; 
put obscount=F16.0-L ;
stop ;
set twelves nobs=obscount ;
run ;

The result is

obscount=5

Obstacles

Views

Now exercise NOBS while processing a DATA step view:

data vv / view = vv ;
set sashelp.class(where = (age EQ 12) ) ;
run ; 

data _null_ ; 
put obscount=F16.0-L ;
stop ;
set vv nobs=obscount ;
run ;

The result (on a Windows system) is

obscount=9007199254740992

Sequential Engines

Sequential engines also get into trouble. We can use the XML engine, which is sequential, as an example:

libname xmlLib xml 'c:\temp\xml.xml' ;

data xmlLib.ss ;
set sashelp.class ;
run ; 

data _null_ ; 
put obscount=F16.0-L ;
stop ;
set xmlLib.ss nobs=obscount ;
run ; 

libname xmlLib clear ;

The result:

obscount=9007199254740992

Data Base Engines

Another situation which defeats the OBS option is third-party data base engines. Excel, though not a data base, shares the behavior in question. To illustrate, run

libname xlLib excel 'c:\temp\xlBook.xls' ; 

proc datasets library=xlLib nolist ;
delete Sheet1 ;
run ;
quit ;

data xlLib.Sheet1 ;
set sashelp.class ;
run ;

data _null_ ; 
put obscount=F16.0-L ;
stop ;
set xlLib.Sheet1 nobs=obscount ;
run ; 

libname xlLib clear ;

The result is

obscount=9007199254740992

OPEN=DEFER

A final circumstance preventing NOBS from working is the OPEN=DEFER option, as in

data _null_ ; 
put obscount=F16.0-L ;
stop ;
set elevens twelves open=defer nobs=obscount ;
run ; 

The result is

obscount=9007199254740992

Effect of Deleted Observations

Now consider circumstances in which NOBS works, but produces possibly unexpected or unwanted results. Specifically, delete some observations from one of the data sets:

data   twelves ;
modify twelves ;
if sex EQ 'F' then remove ;
run ;  

proc print data=twelves ;
run ;

This is the output:

Obs     Name     Sex    Age

 1     James      M      12
 3     John       M      12
 5     Robert     M      12

Notice the presentation of the original observation numbers. That is one clue that the deleted observations are still there. Indeed, exercising NOBS as in

data _null_ ; 
put obscount=F16.0-L ;
stop ;
set twelves nobs=obscount ;
run ; 

yields

obscount=5

NOBS and WHERE processing

NOBS has no interaction with WHERE processing. Consequently, a count generated using NOBS will include all observations. It will not exclude observations which the WHERE condition excludes. Here is an example:

data _null_; 
put obscount=F16.0-L ;
stop ;
set sashelp.class(where = (age EQ 12) ) nobs=obscount ;
run ;

The result is

obscount=19

Workarounds

The large integers indicating the inability of the NOBS option to operate can be replaced by special missing values:

data _null_ ; 
obscount = ifn (  obscount EQ constant('ExactInt')
                , (.N)
                , obscount
               ) ;
put obscount=F16.0-L ;
stop ;
set vv nobs=obscount ;
run ;

Result:

obscount=N

This represents perhaps a slight improvement in the situation.

To get a logical count of observations (that is, excluding deleted observations), use the ATTRN function instead of the NOBS option, as in

data _null_ ; 
dsid = open('twelves') ;
obscount = attrn(dsid,'nlobs') ;
rc = close(dsid) ;
put obscount=F16.0-L ;
run ;

The result is

obscount=3

Some of the issues illustrated above affect the MODIFY statement as well as the SET statement.

See Jack Hamilton's paper How Many Observations Are In My Data Set? for a macro which provides a more bullet-proof way of generating observation counts.

 

Thanks to Howles for contributing this article to sasCommunity.org.

Version history
Last update:
‎05-15-2019 04:54 PM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags