BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mirisage
Obsidian | Level 7

Hi SAS Forum;

Could you please help me on this layman question.

I ran the below code for my large data set to see how many records are there.

Proc contents data=huge_data_set varnum;
run;

The output table generated has a dot in front of the "number of observations"
and it does not show the real number of observations in the data set.


Q:Why Proc contents doesn't provide the "number of observations" ?

Thanks for your time
Mirisage

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

And apparently from another thread, if the number records exceeds the maximum value that SAS will store in a numeric variable then Contets reports missing. Depending on how precise you need the record count, you can get an estimate from the the Engine/host dependent information. Use ("Number of data set pages" - "First Data Page")* "Max obs per page".

The first "pages" store the header information and varies depending on numbers of variables and lengths of labels and such.

You could try looking up the number of observations in the Dictionary table Members, but will probably find the number of records there missing as well.

View solution in original post

17 REPLIES 17
Astounding
PROC Star

There are several possibilities as to why the number of observations would be unknown.

1. The data is stored on tape.

2. The data is stored on disk, but in tape format.

3. The data set is actually a view rather than a data set.

You probably already know which of these applies ... but if you don't then it's likely to be #2.

Good luck.

ballardw
Super User

And apparently from another thread, if the number records exceeds the maximum value that SAS will store in a numeric variable then Contets reports missing. Depending on how precise you need the record count, you can get an estimate from the the Engine/host dependent information. Use ("Number of data set pages" - "First Data Page")* "Max obs per page".

The first "pages" store the header information and varies depending on numbers of variables and lengths of labels and such.

You could try looking up the number of observations in the Dictionary table Members, but will probably find the number of records there missing as well.

Mirisage
Obsidian | Level 7

Hi Astounding and ballardw,

Many thanks to both of you for providing me with this knowledge.

Best regards

Mirisage

Scott_Mitchell
Quartz | Level 8

What happens if you use the following?

%LET DSID  = %SYSFUNC(OPEN(WORK.NEWFILES,I));

%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));

Does the macro variable store the correct number of observations?

Mirisage
Obsidian | Level 7

Hi Scott_Mitchell,

I am very curious to run your code and see.

My very large data set is stored in a warehouse.

Lets say I assign the library name to the warehouse as "location".

Then the proc contenets code that has not provided us with the actual number of observations is like below.


Proc contents data=location.huge_data_set varnum;

run;


I can understand only that much.


Question:

Then what should I do to apply your code?

Is that just copy your below code into my sas editor and running?

%LET DSID  = %SYSFUNC(OPEN(WORK.NEWFILES,I));

%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));


Please let me know.

Thanks


Mirisage


Fugue
Quartz | Level 8

Take Scott's code, paste it into a SAS program and submit it. Add %put statements to output the results to the LOG. Something like this . . .

%LET DSID  = %SYSFUNC(OPEN(WORK.NEWFILES,I));

%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));

%PUT the number of observations is &TOTAL;

Scott_Mitchell
Quartz | Level 8

You also need to change WORK.NEWFILES to the name of your dataset LOCATION.HUGE_DATA_SET.  Probably self evident, but just in case.

%LET DSID  = %SYSFUNC(OPEN(WORK.NEWFILES,I));

%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));

%PUT the number of observations is &TOTAL;

Mirisage
Obsidian | Level 7

Hi Fugue and Scott_Mitchell,

Thanks. Then I ran the follwing SAS code.

%LET DSID  = %SYSFUNC(OPEN(LOCATION.HUGE_DATA_SET,I));

%LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));

%PUT the number of observations is &TOTAL;

It produced the following log note.

%LET DSID = %SYSFUNC(OPEN(pcfwh.U_PCF_PROPENSITY,I));

16

17 %LET TOTAL = %SYSFUNC(ATTRN(&DSID.,NOBS));

SYMBOLGEN: Macro variable DSID resolves to 2

18

19 %PUT the number of observations is &TOTAL;

SYMBOLGEN: Macro variable TOTAL resolves to -1

the number of observations is -1

It seems that the number of observations is not given correctly.

Thanks anyway to both of you.

Mirisage

ballardw
Super User

Which isn't really surprising. The ATTRN function is going to have the same numeric storage/display limitations as any other SAS numeric value.

I would wonder if the data warehouse has an equivalent of the SAS dictionary tables that could be queried.

Scott_Mitchell
Quartz | Level 8

I was hoping for an error or a missing value, as opposed to -1 for number of observations not available.

I can't believe there isn't an accurate way to identify the number of observations in a dataset that size. 

Fugue
Quartz | Level 8

What version of SAS are you running?

Fugue
Quartz | Level 8

The reason I asked about the version of SAS you are running is that there is an EXTENDOBSCOUNT data set option beginning in SAS 9.3 that is designed for use with large datasets. I haven't thoroughly read the SAS documentation and we don't have datasets anywhere near the necessary size limits, so I'm not sure whether this option will have any effect on observation counts produced by PROC CONTENTS, etc.

Anyone know?

Mirisage
Obsidian | Level 7

Hi Tom and Fugue,

Sorry I could not respond you.

For Tom's question, What do you mean by a 'warehouse'? It is a Oracle data base. May be I cannot give the name of the data base for confidentiality

For Fugue's question: I am using SAS Ver 9.2 in Windows. It is emebedded in SAS EG ver 4.3. (My IT knowledge is not good, but I think we can say like ....."embedded in SAS EG").

Thanks every one for this discussion and great knowledge.

Regards

Mirisage

Scott_Mitchell
Quartz | Level 8

Could he use sql pass thru and run a query to count the number of rows and convert the results to character before they are passed back to SAS?

You could also try the system table all_tables  column num_rows however this isn't always up to date.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 8374 views
  • 4 likes
  • 6 in conversation