My poor tired eyes having trouble reading the Observations count in the output of a Proc Contents for large datasets. For example 5556064398 would be a lot easier for me to read were it formatted 5,556,064,398.
I know how to do this for, say, for the frequency in Proc Freq:
PROC TEMPLATE;
EDIT Base.Freq.OneWayList;
EDIT Frequency;
FORMAT = COMMA22.;
END;
END;
RUN;
Is there a way to do this for Proc Contents?
Listing out the template,
proc template;
source base.contents;
run;
tells me that var cValue2 (which is used for Observations) is a character variable, 😞 so maybe I'm just out of luck.
define cValue2; format = $32.; just = l; end;
But if I save the intermediate tables created by ODS, I see that there's an nValue2 var which is numeric. Would that I could use nValue2 and apply my COMMAn. format, but since it's not a part of the Template definition, I'm not seeing how to do that.
So, again my question: Is there a way to display the Observations count in the output of a Proc Contents such that it is formatted with a COMMAn. format?
Thank you,
Jim
The only way I can think of (but then my thinking is limited) is to create a custom character format for cValue2 that inserts commas when long numbers are found, something like this.
proc fcmp outlib=WORK.FUNCS.GENERAL;
function commastr( STR$ ) $ ;
length A B $80 ;
A=STR;
do until (A=B);
B=A;
A=prxchange('s/(\d)(\d\d\d(\Z|,))/\1,\2/',1,trim(B));
end;
return (A);
endsub;
run;
options cmplib=WORK.FUNCS;
proc format ;
value $commastr low-high=[commastr()];
run;
data T;
A= '1234567890' ;
putlog A= $commastr20.;
run;
A=1,234,567,890
The only way I can think of (but then my thinking is limited) is to create a custom character format for cValue2 that inserts commas when long numbers are found, something like this.
proc fcmp outlib=WORK.FUNCS.GENERAL;
function commastr( STR$ ) $ ;
length A B $80 ;
A=STR;
do until (A=B);
B=A;
A=prxchange('s/(\d)(\d\d\d(\Z|,))/\1,\2/',1,trim(B));
end;
return (A);
endsub;
run;
options cmplib=WORK.FUNCS;
proc format ;
value $commastr low-high=[commastr()];
run;
data T;
A= '1234567890' ;
putlog A= $commastr20.;
run;
A=1,234,567,890
Chris,
This is brilliant. Please accept my thanks.
So, what I did, and I think this is fairly practical, was put a template change into a little macro, as follows. The &Null below is just that, a NULL. It resolves to nothing, but in a SAS editor window it preserves code colorization.
&Null %MACRO Format_Contents();
ODS PATH RESET;
ODS PATH (PREPEND) WORK.Templat(UPDATE);
PROC TEMPLATE;
EDIT Base.Contents.Attributes;
EDIT cValue2;
FORMAT = $Commastr22.;
END;
END;
EDIT Base.Contents.EngineHost;
EDIT cValue1;
FORMAT = $Commastr22.;
END;
END;
RUN;
%MEND Format_Contents;
Having the template change in a macro allows me to do a simple macro call before a Proc Contents, like so:
%Format_Contents;
PROC Contents DATA=LAB.COMBINED_LAB;
RUN;
One of course could also apply the template change in autoexec.sas or by various other means, but using a macro is simple enough and allows one to pick and choose when to apply the change.
And, finally, the results, shown below.
I had thought that a Format was the way to go to add commas to the output of Contents, but I was uncertain how to go about it, and I would not have thought of the PRXCHANGE you used.
Again, my thanks,
Jim
Proc Contents -- BEFORE Data Set Name LAB.COMBINED_LAB Observations 5556064398 Member Type DATA Variables 23 Engine V9 Indexes 0 Created 07/01/2020 09:34:11 Observation Length 446 Last Modified 07/01/2020 09:34:11 Deleted Observations 0 Protection Compressed CHAR Data Set Type Reuse Space NO Label Point to Observations YES Data Representation WINDOWS_64 Sorted NO Encoding wlatin1 Western (Windows) Engine/Host Dependent Information Data Set Page Size 65536 Number of Data Set Pages 14363521 Number of Data Set Repairs 0 ExtendObsCounter YES Filename Z:\Lab_Results\Production\combined_lab.sas7bdat Release Created 9.0401M6 Host Created X64_SRV16 Owner Name MS\cpcssmo File Size 877GB File Size (bytes) 941327777792 Run Date 2020/07/28 - Run Time 23:00:19.00 _______________________________________________________________________________________ Proc Contents -- AFTER Data Set Name LAB.COMBINED_LAB Observations 5,556,064,398 Member Type DATA Variables 23 Engine V9 Indexes 0 Created 07/01/2020 09:34:11 Observation Length 446 Last Modified 07/01/2020 09:34:11 Deleted Observations 0 Protection Compressed CHAR Data Set Type Reuse Space NO Label Point to Observations YES Data Representation WINDOWS_64 Sorted NO Encoding wlatin1 Western (Windows) Engine/Host Dependent Information Data Set Page Size 65,536 Number of Data Set Pages 14,363,521 Number of Data Set Repairs 0 ExtendObsCounter YES Filename Z:\Lab_Results\Product Release Created 9.0401M6 Host Created X64_SRV16 Owner Name MS\cpcssmo File Size 877GB File Size (bytes) 941,327,777,792 Run Date 2020/07/28 - Run Time 23:00:19.14
Whoops. Just caught something. $Commastr22. will absolutely put commas into a number, but $Commastr22. will also truncate non-number containing character fields to 22 positions.
I've modified my macro as shown below to compensate for this. I've set a default of 256 which should generally be enough for a Proc Contents but can be easily overridden if more length is needed. I suppose I could just make it 32767 and avoid any possible truncation. Hmm. Have to think about that. I suppose it's not much overhead for something like Proc Contents to default to 32767.
Jim
&Null %MACRO Format_Contents(MaxText=256);
ODS PATH RESET;
ODS PATH (PREPEND) WORK.Templat(UPDATE);
PROC TEMPLATE;
EDIT Base.Contents.Attributes;
EDIT cValue2;
FORMAT = $Commastr&MaxText..;
END;
END;
EDIT Base.Contents.EngineHost;
EDIT cValue1;
FORMAT = $Commastr&MaxText..;
END;
END;
RUN;
%MEND Format_Contents;
Glad you like this solution. I must agree that the output does look better.
Be mindful that the string length in the function must be changed too if you want to exceed a length of 80.
Maybe set everything to 800 and be done with it.
For such a massive data set, have you considered using a compressed SPDE table?.
I had not considered SPDE. I have heard of SPDE/SPDG, but I must confess I've never worked with it. I've worked in 7 different SAS shops over time, but I've never seen it used.
I (very) recently picked up a copy of your book, and I briefly scanned the section on SPDE, but I haven't read the section yet. Having not yet read your discussion of the subject, I'm not sure if the following applies, but: The struggle I have with most SAS books is that they tell what a product does and how to generally use it, but most books do not give good criteria for when something like SPDE would be a good choice. Particularly when a product appears large and complex, as does SPDE to me, I am unsure when it might best be employed.
That said, I shall now read that section in your book.
Jim
Never had much use for SPDE. Much easier to put the large data into database system. Especially since our IT department provides support for databases.
> Much easier to put the large data into database system
Two drawbacks:
1. Databases don't keep data sorted:
- Every join has to sort/random access the data. Again and again.
- SAS features such as lag() or first. can't be used.
Sorted SAS tables are a golden asset.
2. Accessing data for processing in SAS is usually much slower than reading from disk.
SPDE is undeservedly unknown and underused.
I wish SAS fixed a few baseless restrictions, such as no view, or no loading in SASFILE, but regardless, it is well worth using as the default engine.
The SPDE engine has many awesome features, and a number of drawbacks. See the table in the SPDE chapter for a comprehensive list.
The main benefits I use all the time are:
- Much better compression. Copy a million rows from your table and compare the file sizes. Better compression means fewer I/O operations which is almost always the performance bottleneck for SAS jobs. Don't forget to set a high partition size (for example partsize=1T) if you can't spread the table across several disks, to avoid creating thousands of files. See which compression method works best.
- Much better indexing
- On-the-fly sorting, which is faster than running a proc sort.
> Particularly when a product appears large and complex, as does SPDE to me, I am unsure when it might best be employed.
It is not complex to use it: it's just a libname away.
And there few cases when its limitations are an issue. Again, refer to the table to decide.
All my jobs contain these lines:
%let wdir=%sysfunc(pathname(WORK));
libname W spde "&wdir" partsize=1T compress=binary;
and use library W instead of WORK as the temp library.
Chris:
Thank you for that. I'm reading the chapter in your book. It'll take me a while to find places to store all this in my brain such that they form a coherent whole. It *sounds* though like it has quite a bit of potential for performance tuning. I'm quite interested in this.
Jim
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.