- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi SAS Forum;
Could you please help me on this layman question.
I ran the below code for my large data set to see how many records are there.
Proc contents data=huge_data_set varnum;
run;
The output table generated has a dot in front of the "number of observations"
and it does not show the real number of observations in the data set.
Q:Why Proc contents doesn't provide the "number of observations" ?
Thanks for your time
Mirisage
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And apparently from another thread, if the number records exceeds the maximum value that SAS will store in a numeric variable then Contets reports missing. Depending on how precise you need the record count, you can get an estimate from the the Engine/host dependent information. Use ("Number of data set pages" - "First Data Page")* "Max obs per page".
The first "pages" store the header information and varies depending on numbers of variables and lengths of labels and such.
You could try looking up the number of observations in the Dictionary table Members, but will probably find the number of records there missing as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are several possibilities as to why the number of observations would be unknown.
1. The data is stored on tape.
2. The data is stored on disk, but in tape format.
3. The data set is actually a view rather than a data set.
You probably already know which of these applies ... but if you don't then it's likely to be #2.
Good luck.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And apparently from another thread, if the number records exceeds the maximum value that SAS will store in a numeric variable then Contets reports missing. Depending on how precise you need the record count, you can get an estimate from the the Engine/host dependent information. Use ("Number of data set pages" - "First Data Page")* "Max obs per page".
The first "pages" store the header information and varies depending on numbers of variables and lengths of labels and such.
You could try looking up the number of observations in the Dictionary table Members, but will probably find the number of records there missing as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Astounding and ballardw,
Many thanks to both of you for providing me with this knowledge.
Best regards
Mirisage
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What happens if you use the following?
%LET DSID = %SYSFUNC(OPEN(WORK.NEWFILES,I));
%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));
Does the macro variable store the correct number of observations?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Scott_Mitchell,
I am very curious to run your code and see.
My very large data set is stored in a warehouse.
Lets say I assign the library name to the warehouse as "location".
Then the proc contenets code that has not provided us with the actual number of observations is like below.
Proc contents data=location.huge_data_set varnum;
run;
I can understand only that much.
Question:
Then what should I do to apply your code?
Is that just copy your below code into my sas editor and running?
%LET DSID = %SYSFUNC(OPEN(WORK.NEWFILES,I));
%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));
Please let me know.
Thanks
Mirisage
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Take Scott's code, paste it into a SAS program and submit it. Add %put statements to output the results to the LOG. Something like this . . .
%LET DSID = %SYSFUNC(OPEN(WORK.NEWFILES,I));
%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));
%PUT the number of observations is &TOTAL;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You also need to change WORK.NEWFILES to the name of your dataset LOCATION.HUGE_DATA_SET. Probably self evident, but just in case.
%LET DSID = %SYSFUNC(OPEN(WORK.NEWFILES,I));
%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));
%PUT the number of observations is &TOTAL;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Fugue and Scott_Mitchell,
Thanks. Then I ran the follwing SAS code.
%LET DSID = %SYSFUNC(OPEN(LOCATION.HUGE_DATA_SET,I));
%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));
%PUT the number of observations is &TOTAL;
It produced the following log note.
%LET DSID = %SYSFUNC(OPEN(pcfwh.U_PCF_PROPENSITY,I));
16
17 %LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));
SYMBOLGEN: Macro variable DSID resolves to 2
18
19 %PUT the number of observations is &TOTAL;
SYMBOLGEN: Macro variable TOTAL resolves to -1
the number of observations is -1
It seems that the number of observations is not given correctly.
Thanks anyway to both of you.
Mirisage
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Which isn't really surprising. The ATTRN function is going to have the same numeric storage/display limitations as any other SAS numeric value.
I would wonder if the data warehouse has an equivalent of the SAS dictionary tables that could be queried.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I was hoping for an error or a missing value, as opposed to -1 for number of observations not available.
I can't believe there isn't an accurate way to identify the number of observations in a dataset that size.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What version of SAS are you running?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The reason I asked about the version of SAS you are running is that there is an EXTENDOBSCOUNT data set option beginning in SAS 9.3 that is designed for use with large datasets. I haven't thoroughly read the SAS documentation and we don't have datasets anywhere near the necessary size limits, so I'm not sure whether this option will have any effect on observation counts produced by PROC CONTENTS, etc.
Anyone know?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tom and Fugue,
Sorry I could not respond you.
For Tom's question, What do you mean by a 'warehouse'? It is a Oracle data base. May be I cannot give the name of the data base for confidentiality
For Fugue's question: I am using SAS Ver 9.2 in Windows. It is emebedded in SAS EG ver 4.3. (My IT knowledge is not good, but I think we can say like ....."embedded in SAS EG").
Thanks every one for this discussion and great knowledge.
Regards
Mirisage
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Could he use sql pass thru and run a query to count the number of rows and convert the results to character before they are passed back to SAS?
You could also try the system table all_tables column num_rows however this isn't always up to date.