BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jimbarbour
Meteorite | Level 14

My poor tired eyes having trouble reading the Observations count in the output of a Proc Contents for large datasets.  For example 5556064398 would be a lot easier for me to read were it formatted 5,556,064,398.  

 

I know how to do this for, say, for the frequency in Proc Freq:

	PROC TEMPLATE;
		EDIT Base.Freq.OneWayList;
			EDIT Frequency;
				FORMAT = COMMA22.;
			END;
		END;
	RUN;

Is there a way to do this for Proc Contents? 

Listing out the template,

proc template;
   source base.contents;
run;

tells me that var cValue2 (which is used for Observations) is a character variable, 😞 so maybe I'm just out of luck.

   define cValue2;
      format = $32.;
      just = l;
   end;

But if I save the intermediate tables created by ODS, I see that there's an nValue2 var which is numeric.  Would that I could use nValue2 and apply my COMMAn. format, but since it's not a part of the Template definition, I'm not seeing how to do that.

So, again my question:  Is there a way to display the Observations count in the output of a Proc Contents such that it is formatted with a COMMAn. format?

 

Thank you,

 

Jim

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

The only way I can think of (but then my thinking is limited) is to create a custom character format for cValue2 that inserts commas when long numbers are found, something like this.

 

proc fcmp outlib=WORK.FUNCS.GENERAL;
  function commastr( STR$ ) $ ;
    length A B $80 ; 
    A=STR;
    do until (A=B);
      B=A;
      A=prxchange('s/(\d)(\d\d\d(\Z|,))/\1,\2/',1,trim(B));  
    end;
    return (A);
  endsub;
run;

options cmplib=WORK.FUNCS;

proc format ;
  value $commastr low-high=[commastr()];
run;

data T;
  A= '1234567890' ;
  putlog A= $commastr20.;
run;        

A=1,234,567,890

 

View solution in original post

9 REPLIES 9
ChrisNZ
Tourmaline | Level 20

The only way I can think of (but then my thinking is limited) is to create a custom character format for cValue2 that inserts commas when long numbers are found, something like this.

 

proc fcmp outlib=WORK.FUNCS.GENERAL;
  function commastr( STR$ ) $ ;
    length A B $80 ; 
    A=STR;
    do until (A=B);
      B=A;
      A=prxchange('s/(\d)(\d\d\d(\Z|,))/\1,\2/',1,trim(B));  
    end;
    return (A);
  endsub;
run;

options cmplib=WORK.FUNCS;

proc format ;
  value $commastr low-high=[commastr()];
run;

data T;
  A= '1234567890' ;
  putlog A= $commastr20.;
run;        

A=1,234,567,890

 

jimbarbour
Meteorite | Level 14

Chris,

 

This is brilliant.  Please accept my thanks.

 

So, what I did, and I think this is fairly practical, was put a template change into a little macro, as follows.  The &Null below is just that, a NULL.  It resolves to nothing, but in a SAS editor window it preserves code colorization.

&Null	%MACRO	Format_Contents();
	ODS PATH RESET;
	ODS PATH (PREPEND) WORK.Templat(UPDATE);

	PROC TEMPLATE;
		EDIT Base.Contents.Attributes;
			EDIT cValue2;
				FORMAT = $Commastr22.;
			END;
		END;
		EDIT Base.Contents.EngineHost;
			EDIT cValue1;
				FORMAT = $Commastr22.;
			END;
		END;
	RUN;
%MEND	Format_Contents;

Having the template change in a macro allows me to do a simple macro call before a Proc Contents, like so:

%Format_Contents;

PROC  Contents	DATA=LAB.COMBINED_LAB;
RUN;

One of course could also apply the template change in autoexec.sas or by various other means, but using a macro is simple enough and allows one to pick and choose when to apply the change.

 

And, finally, the results, shown below.

 

I had thought that a Format was the way to go to add commas to the output of Contents, but I was uncertain how to go about it, and I would not have thought of the PRXCHANGE you used.

 

Again, my thanks,

 

Jim

Proc Contents -- BEFORE

Data Set Name        LAB.COMBINED_LAB              Observations           5556064398
Member Type          DATA                          Variables              23        
Engine               V9                            Indexes                0         
Created              07/01/2020 09:34:11           Observation Length     446       
Last Modified        07/01/2020 09:34:11           Deleted Observations   0         
Protection                                         Compressed             CHAR      
Data Set Type                                      Reuse Space            NO        
Label                                              Point to Observations  YES       
Data Representation  WINDOWS_64                    Sorted                 NO        
Encoding             wlatin1  Western (Windows)                                     


                     Engine/Host Dependent Information

Data Set Page Size          65536                                          
Number of Data Set Pages    14363521                                       
Number of Data Set Repairs  0                                              
ExtendObsCounter            YES                                            
Filename                    Z:\Lab_Results\Production\combined_lab.sas7bdat
Release Created             9.0401M6                                       
Host Created                X64_SRV16                                      
Owner Name                  MS\cpcssmo                                     
File Size                   877GB                                          
File Size (bytes)           941327777792                                   
 
Run Date 2020/07/28 - Run Time 23:00:19.00

_______________________________________________________________________________________


Proc Contents -- AFTER

Data Set Name        LAB.COMBINED_LAB              Observations           5,556,064,398         
Member Type          DATA                          Variables              23                    
Engine               V9                            Indexes                0                     
Created              07/01/2020 09:34:11           Observation Length     446                   
Last Modified        07/01/2020 09:34:11           Deleted Observations   0                     
Protection                                         Compressed             CHAR                  
Data Set Type                                      Reuse Space            NO                    
Label                                              Point to Observations  YES                   
Data Representation  WINDOWS_64                    Sorted                 NO                    
Encoding             wlatin1  Western (Windows)                                                 


                     Engine/Host Dependent Information

Data Set Page Size          65,536                                         
Number of Data Set Pages    14,363,521                                     
Number of Data Set Repairs  0                                              
ExtendObsCounter            YES                                            
Filename                    Z:\Lab_Results\Product                         
Release Created             9.0401M6                                       
Host Created                X64_SRV16                                      
Owner Name                  MS\cpcssmo                                     
File Size                   877GB                                          
File Size (bytes)           941,327,777,792  

Run Date 2020/07/28 - Run Time 23:00:19.14
jimbarbour
Meteorite | Level 14

Whoops.  Just caught something.  $Commastr22. will absolutely put commas into a number, but $Commastr22. will also truncate non-number containing character fields to 22 positions.  

 

I've modified my macro as shown below to compensate for this.  I've set a default of 256 which should generally be enough for a Proc Contents but can be easily overridden if more length is needed.  I suppose I could just make it 32767 and avoid any possible truncation.  Hmm.  Have to think about that.  I suppose it's not much overhead for something like Proc Contents to default to 32767.

 

Jim

&Null	%MACRO	Format_Contents(MaxText=256);
	ODS PATH RESET;
	ODS PATH (PREPEND) WORK.Templat(UPDATE);

	PROC TEMPLATE;
		EDIT Base.Contents.Attributes;
			EDIT cValue2;
				FORMAT = $Commastr&MaxText..;
			END;
		END;
		EDIT Base.Contents.EngineHost;
			EDIT cValue1;
				FORMAT = $Commastr&MaxText..;
			END;
		END;
	RUN;
%MEND	Format_Contents;
ChrisNZ
Tourmaline | Level 20

Glad you like this solution. I must agree that the output does look better.

Be mindful that the string length in the function must be changed too if you want to exceed a length of 80.

Maybe set everything to 800 and be done with it.

 

For such a massive data set, have you considered using a compressed SPDE table?.

jimbarbour
Meteorite | Level 14

@ChrisNZ,

 

I had not considered SPDE.  I have heard of SPDE/SPDG, but I must confess I've never worked with it.  I've worked in 7 different SAS shops over time, but I've never seen it used.

 

I (very) recently picked up a copy of your book, and I briefly scanned the section on SPDE, but I haven't read the section yet.  Having not yet read your discussion of the subject, I'm not sure if the following applies, but:  The struggle I have with most SAS books is that they tell what a product does and how to generally use it, but most books do not give good criteria for when something like SPDE would be a good choice.  Particularly when a product appears large and complex, as does SPDE to me, I am unsure when it might best be employed.

 

That said, I shall now read that section in your book.

 

Jim

Tom
Super User Tom
Super User

Never had much use for SPDE.  Much easier to put the large data into database system. Especially since our IT department provides support for databases.

ChrisNZ
Tourmaline | Level 20

> Much easier to put the large data into database system

Two drawbacks:

1. Databases don't keep data sorted:

 - Every join has to sort/random access the data. Again and again.

 - SAS features such as lag() or first. can't be used.

Sorted SAS tables are a golden asset.

2. Accessing data for processing in SAS is usually much slower than reading from disk.

 

SPDE is undeservedly unknown and underused.

I wish SAS fixed a few baseless restrictions, such as no view, or no loading in SASFILE, but regardless, it is well worth using as the default engine.

ChrisNZ
Tourmaline | Level 20

The SPDE engine has many awesome features, and a number of drawbacks. See the table in the SPDE chapter for a comprehensive list.

The main benefits I use all the time are:

- Much better compression. Copy a million rows from your table and compare the file sizes. Better compression means fewer I/O operations which is almost always the performance bottleneck for SAS jobs. Don't forget to set a high partition size (for example partsize=1T) if you can't spread the table across several disks, to avoid creating thousands of files. See which compression method works best.

- Much better indexing

- On-the-fly sorting, which is faster than running a proc sort.

 

> Particularly when a product appears large and complex, as does SPDE to me, I am unsure when it might best be employed.

It is not complex to use it: it's just a libname away.

And there few cases when its limitations are an issue. Again, refer to the table to decide. 

All my jobs contain these lines:

%let wdir=%sysfunc(pathname(WORK));
libname W spde "&wdir" partsize=1T compress=binary;

and use library W instead of WORK as the temp library.

 

jimbarbour
Meteorite | Level 14

Chris:

 

Thank you for that.  I'm reading the chapter in your book.  It'll take me a while to find places to store all this in my brain such that they form a coherent whole.  It *sounds* though like it has quite a bit of potential for performance tuning.  I'm quite interested in this.

 

Jim

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 906 views
  • 5 likes
  • 3 in conversation