The documentation for the SET statement tells us that NOBS= option initializes a variable "whose value is usually the total number of observations in the input data set or data sets". The inclusion of the word "usually" is notable. The text goes on state:
In fact there are other circumstances which completely prevent NOBS from working or cause it to produce possibly unexpected results.
Here are two native SAS data files which will be used to illustrate the issues.
data elevens twelves ; set sashelp.class ; keep name age sex ; select (age) ; when (11) output elevens ; when (12) output twelves ; otherwise ; end ; run ;
ELEVENS:
Obs Name Sex Age 1 Joyce F 11 2 Thomas M 11
TWELVES:
Obs Name Sex Age 1 James M 12 2 Jane F 12 3 John M 12 4 Louise F 12 5 Robert M 12
Here is a trouble-free exercise of the NOBS option:
data _null_; put obscount=F16.0-L ; stop ; set twelves nobs=obscount ; run ;
The result is
obscount=5
Now exercise NOBS while processing a DATA step view:
data vv / view = vv ; set sashelp.class(where = (age EQ 12) ) ; run ; data _null_ ; put obscount=F16.0-L ; stop ; set vv nobs=obscount ; run ;
The result (on a Windows system) is
obscount=9007199254740992
Sequential engines also get into trouble. We can use the XML engine, which is sequential, as an example:
libname xmlLib xml 'c:\temp\xml.xml' ; data xmlLib.ss ; set sashelp.class ; run ; data _null_ ; put obscount=F16.0-L ; stop ; set xmlLib.ss nobs=obscount ; run ; libname xmlLib clear ;
The result:
obscount=9007199254740992
Another situation which defeats the OBS option is third-party data base engines. Excel, though not a data base, shares the behavior in question. To illustrate, run
libname xlLib excel 'c:\temp\xlBook.xls' ; proc datasets library=xlLib nolist ; delete Sheet1 ; run ; quit ; data xlLib.Sheet1 ; set sashelp.class ; run ; data _null_ ; put obscount=F16.0-L ; stop ; set xlLib.Sheet1 nobs=obscount ; run ; libname xlLib clear ;
The result is
obscount=9007199254740992
A final circumstance preventing NOBS from working is the OPEN=DEFER option, as in
data _null_ ; put obscount=F16.0-L ; stop ; set elevens twelves open=defer nobs=obscount ; run ;
The result is
obscount=9007199254740992
Now consider circumstances in which NOBS works, but produces possibly unexpected or unwanted results. Specifically, delete some observations from one of the data sets:
data twelves ; modify twelves ; if sex EQ 'F' then remove ; run ; proc print data=twelves ; run ;
This is the output:
Obs Name Sex Age 1 James M 12 3 John M 12 5 Robert M 12
Notice the presentation of the original observation numbers. That is one clue that the deleted observations are still there. Indeed, exercising NOBS as in
data _null_ ; put obscount=F16.0-L ; stop ; set twelves nobs=obscount ; run ;
yields
obscount=5
NOBS has no interaction with WHERE processing. Consequently, a count generated using NOBS will include all observations. It will not exclude observations which the WHERE condition excludes. Here is an example:
data _null_; put obscount=F16.0-L ; stop ; set sashelp.class(where = (age EQ 12) ) nobs=obscount ; run ;
The result is
obscount=19
The large integers indicating the inability of the NOBS option to operate can be replaced by special missing values:
data _null_ ; obscount = ifn ( obscount EQ constant('ExactInt') , (.N) , obscount ) ; put obscount=F16.0-L ; stop ; set vv nobs=obscount ; run ;
Result:
obscount=N
This represents perhaps a slight improvement in the situation.
To get a logical count of observations (that is, excluding deleted observations), use the ATTRN function instead of the NOBS option, as in
data _null_ ; dsid = open('twelves') ; obscount = attrn(dsid,'nlobs') ; rc = close(dsid) ; put obscount=F16.0-L ; run ;
The result is
obscount=3
Some of the issues illustrated above affect the MODIFY statement as well as the SET statement.
See Jack Hamilton's paper How Many Observations Are In My Data Set? for a macro which provides a more bullet-proof way of generating observation counts.
Thanks to Howles for contributing this article to sasCommunity.org.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.