- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 08-05-2009 10:26 PM
(2142 views)
hi,
how can i store the obs count of a dataset to a macro variable when i use proc datasets?
i know i saw a sample of this code back then but now i couldnt find it anymore 😞
thanks a lot!
how can i store the obs count of a dataset to a macro variable when i use proc datasets?
i know i saw a sample of this code back then but now i couldnt find it anymore 😞
thanks a lot!
7 REPLIES 7
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The two processes are inconsistent. There are SAS CALL functions and DICTIONARY members(s) and the SET stmt NOBS= parameter, used with a CALL SYMPUT for the purpose mentioned.
Scott Barry
SBBWorks, Inc.
Scott Barry
SBBWorks, Inc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
hi scott,
thanks for the info, but in what way are these approaches inconsistent?
is there any other way other than the select count into: statement in getting the number of obs of a certain dataset?
thanks,
milton
thanks for the info, but in what way are these approaches inconsistent?
is there any other way other than the select count into: statement in getting the number of obs of a certain dataset?
thanks,
milton
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes there is, and much more efficient too.
Both DICTIONARY tables method, PROC CONTENTS (you may have confused this with PROC DATASETS), or even a DATA STEP with the NOBS options will retrieve immediately the count of obs (EXCEPT if the table is a VIEW or DBMS table) WITHOUT the need to scan the whole data as the SELECT COUNT INTO: technique will need to do.
Check online documentation for DICTIONARY tables and the CONTENTS procedure:
http://support.sas.com/documentation/cdl/en/lrcon/61722/HTML/default/a002299818.htm
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/a000085768.htm
Cheers from Portugal.
Daniel Santos @ www.cgd.pt.
Both DICTIONARY tables method, PROC CONTENTS (you may have confused this with PROC DATASETS), or even a DATA STEP with the NOBS options will retrieve immediately the count of obs (EXCEPT if the table is a VIEW or DBMS table) WITHOUT the need to scan the whole data as the SELECT COUNT INTO: technique will need to do.
Check online documentation for DICTIONARY tables and the CONTENTS procedure:
http://support.sas.com/documentation/cdl/en/lrcon/61722/HTML/default/a002299818.htm
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/a000085768.htm
Cheers from Portugal.
Daniel Santos @ www.cgd.pt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Daniel that's what I'm looking for.
Regards,
Milton
Regards,
Milton
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A succinct and efficient way is the use of macro and SCL functions.;
%LET DSID = %SYSFUNC(OPEN(TEMP.SAMPLE,IN));
%LET NUM = %SYSFUNC(ATTRN(&DSID,NLOBS));
%IF &DSID > 0 %THEN %LET RC=%SYSFUNC(CLOSE(&DSID));
%put There are &num observations in the dataset;
* The first line opens the data set and the last one closes it. this is needed because you are not using data step or
SCL and so could leave a data set open, causing problems later. The second line is what captures the number of observations
from the header of the data set using the SCL ATTRN function called by %SYSFUNC.
* with where clause;
%let dsid = %sysfunc(open(temp.sample (where=(x>500000))));
%let num = %sysfunc(attrn(&dsid,nlobs));
%if &DSID > 0 %then %let rc = %sysfunc(close(&dsid));
%LET DSID = %SYSFUNC(OPEN(TEMP.SAMPLE,IN));
%LET NUM = %SYSFUNC(ATTRN(&DSID,NLOBS));
%IF &DSID > 0 %THEN %LET RC=%SYSFUNC(CLOSE(&DSID));
%put There are &num observations in the dataset;
* The first line opens the data set and the last one closes it. this is needed because you are not using data step or
SCL and so could leave a data set open, causing problems later. The second line is what captures the number of observations
from the header of the data set using the SCL ATTRN function called by %SYSFUNC.
* with where clause;
%let dsid = %sysfunc(open(temp.sample (where=(x>500000))));
%let num = %sysfunc(attrn(&dsid,nlobs));
%if &DSID > 0 %then %let rc = %sysfunc(close(&dsid));
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Succint and with a where clause!
Introduced with SAS9 (iirc) attrn NLOBSF will provide the NOBS as fast as normal for a normal data set, and instead of missing, 0 or -1 for a dataset subject to a where clause, it will go and make the count! here is a log snippet of my demo, based on the code from rob@sas
238 data temp.sample ;
239 do x= 1 to 1e7 ;
240 ranp = 1+ nobs*ranuni(1) ;
241 classn = RANP ;
242 set sashelp.class point = ranp nobs= nobs;
243 output ;
244 end;
245 stop;
246 run;
NOTE: The data set TEMP.SAMPLE has 10000000 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 22.68 seconds
user cpu time 6.40 seconds
system cpu time 2.18 seconds
Memory 181k
247 %let start = %sysfunc( datetime(),best18 ) ;
248 %let dsid = %sysfunc(open(temp.sample ));
249 %let num = %sysfunc(attrn(&dsid,nlobsf));
250 %let rc = %sysfunc(close(&dsid));
251 %let here1 = %sysfunc( datetime(),best18 ) ;
252 %let durn1 = %sysevalf( &here1 - &start ) ;
253 %put without where rc=&rc dsid= &dsid num=&num durn= &durn1 %now ;
without where rc=0 dsid= 1 num=10000000 durn= 0 07AUG2009:12:33:38.681
254
255 %let star2 = %sysfunc( datetime(),best18 ) ;
256 %let dsid = %sysfunc(open(temp.sample (where=(x>500000))));
257 %let num = %sysfunc(attrn(&dsid,nlobsf));
258 %let rc = %sysfunc(close(&dsid));
259 %let here2 = %sysfunc( datetime(),best18 ) ;
260 %let durn2 = %sysevalf( &here2 - &star2 ) ;
261 %put with where rc=&rc dsid= &dsid num=&num durn= &durn2 %now ;
with where rc=0 dsid= 1 num=9500000 durn= 25.1400001049041 07AUG2009:12:34:03.821
With a WHERE clause, interestingly, as durn= a smaller number when the where clause returns a small number of rows, and larger when there are more to count ...
342 %let star2 = %sysfunc( datetime(),best18 ) ;
343 %let dsid = %sysfunc(open(temp.sample (where=(x>9500000))));
344 %let num = %sysfunc(attrn(&dsid,nlobsf));
345 %let rc = %sysfunc(close(&dsid));
346 %let here2 = %sysfunc( datetime(),best18 ) ;
347 %let durn2 = %sysevalf( &here2 - &star2 ) ;
348 %put with where rc=&rc dsid= &dsid num=&num durn= &durn2 %now ;
with where rc=0 dsid= 1 num=500000 durn= 3.375 07AUG2009:15:34:58.682
349 %let star2 = %sysfunc( datetime(),best18 ) ;
350 %let dsid = %sysfunc(open(temp.sample (where=(x>1))));
351 %let num = %sysfunc(attrn(&dsid,nlobsf));
352 %let rc = %sysfunc(close(&dsid));
353 %let here2 = %sysfunc( datetime(),best18 ) ;
354 %let durn2 = %sysevalf( &here2 - &star2 ) ;
355 %put with where rc=&rc dsid= &dsid num=&num durn= &durn2 %now ;
with where rc=0 dsid= 1 num=9999999 durn= 24.2490000724792 07AUG2009:15:35:41.119
it appears that where-clause handling is more efficient than the counting. ~~~~~~
PeterC Message was edited by: Peter.C
Introduced with SAS9 (iirc) attrn NLOBSF will provide the NOBS as fast as normal for a normal data set, and instead of missing, 0 or -1 for a dataset subject to a where clause, it will go and make the count! here is a log snippet of my demo, based on the code from rob@sas
238 data temp.sample ;
239 do x= 1 to 1e7 ;
240 ranp = 1+ nobs*ranuni(1) ;
241 classn = RANP ;
242 set sashelp.class point = ranp nobs= nobs;
243 output ;
244 end;
245 stop;
246 run;
NOTE: The data set TEMP.SAMPLE has 10000000 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 22.68 seconds
user cpu time 6.40 seconds
system cpu time 2.18 seconds
Memory 181k
247 %let start = %sysfunc( datetime(),best18 ) ;
248 %let dsid = %sysfunc(open(temp.sample ));
249 %let num = %sysfunc(attrn(&dsid,nlobsf));
250 %let rc = %sysfunc(close(&dsid));
251 %let here1 = %sysfunc( datetime(),best18 ) ;
252 %let durn1 = %sysevalf( &here1 - &start ) ;
253 %put without where rc=&rc dsid= &dsid num=&num durn= &durn1 %now ;
without where rc=0 dsid= 1 num=10000000 durn= 0 07AUG2009:12:33:38.681
254
255 %let star2 = %sysfunc( datetime(),best18 ) ;
256 %let dsid = %sysfunc(open(temp.sample (where=(x>500000))));
257 %let num = %sysfunc(attrn(&dsid,nlobsf));
258 %let rc = %sysfunc(close(&dsid));
259 %let here2 = %sysfunc( datetime(),best18 ) ;
260 %let durn2 = %sysevalf( &here2 - &star2 ) ;
261 %put with where rc=&rc dsid= &dsid num=&num durn= &durn2 %now ;
with where rc=0 dsid= 1 num=9500000 durn= 25.1400001049041 07AUG2009:12:34:03.821
With a WHERE clause, interestingly, as durn= a smaller number when the where clause returns a small number of rows, and larger when there are more to count ...
342 %let star2 = %sysfunc( datetime(),best18 ) ;
343 %let dsid = %sysfunc(open(temp.sample (where=(x>9500000))));
344 %let num = %sysfunc(attrn(&dsid,nlobsf));
345 %let rc = %sysfunc(close(&dsid));
346 %let here2 = %sysfunc( datetime(),best18 ) ;
347 %let durn2 = %sysevalf( &here2 - &star2 ) ;
348 %put with where rc=&rc dsid= &dsid num=&num durn= &durn2 %now ;
with where rc=0 dsid= 1 num=500000 durn= 3.375 07AUG2009:15:34:58.682
349 %let star2 = %sysfunc( datetime(),best18 ) ;
350 %let dsid = %sysfunc(open(temp.sample (where=(x>1))));
351 %let num = %sysfunc(attrn(&dsid,nlobsf));
352 %let rc = %sysfunc(close(&dsid));
353 %let here2 = %sysfunc( datetime(),best18 ) ;
354 %let durn2 = %sysevalf( &here2 - &star2 ) ;
355 %put with where rc=&rc dsid= &dsid num=&num durn= &durn2 %now ;
with where rc=0 dsid= 1 num=9999999 durn= 24.2490000724792 07AUG2009:15:35:41.119
it appears that where-clause handling is more efficient than the counting. ~~~~~~
PeterC Message was edited by: Peter.C
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
wow! thanks rob and peter. learned something new from your posts.
regards,
milton
regards,
milton