When entering INPUT statements for non SAS datasets and you want to specify the length of a character variable, is there a way to determine the longest length of a variable in a large dataset? To be clcear, pretend that I have a variable for Address and I want to allocate enough spaces for the length of that variable but am not sure of the length the longest address in my data set because it is a large dataset. Is there a command that will allow you to identify the length of the longest Address variable so you don't inadvertantly cut off an address?
I don't know if any database contains metadata for min/max length of character columns.
So your best bet is to read the whole column and keep score of the maximum length, like
proc sql;
select max(length(name)) from sashelp.class;
quit;
You could, as already suggested, just set the length to a really large number and use compress=yes to deal with the unused space, but if anybody creates a new dataset from that and forgets to add the compress option, they will produce unnecessary large dataset files.
A normal sort might also crack your WORK space, as the utility files of proc sort are not compressed.
Sorting a large compressed dataset is best done with the tagsort option, BTW.
When you say from a data set I assume you mean from a data file. There is not way to know the length without measuring. PROC IMPORT may be able to help with this if you set guessing rows large enough. Otherwise you can define all the character variable with very long say $512 and then measure and redefine the length and create new data with proper. SAS a some code for that I will search you can too.
How do you know that you've reached the end of any given address?
Tom
Yes.
Here is a thread that address a similar issue
You have some good advice above on the techincal aspects of this. Me personally, this exemplifies the need for an agreement between vendor and recipient in a data transfer. A clearly written Data Transfer Specification which details how the data is sent, what the data structure is, plus any other necessary information, signed off by both parties is the only way to acurately transfer data from one party to another. Sure, you could guess the maximum length, however next data transfer that could change, or maybe they drop that variable, or change it to number, or move it etc. Documentation is more important than anything else on any process, document, test, then do the process. I can't count the number of times I have seen programmers having to re-write code each time data is recieved, or having to write test scripts on the data etc. purely because they don't have that control over it.
As @RW9 writes - and I coudn't agree more - and Interface contract is a MUST and anything else is simply unprofessional when it comes to regular data exchange between applications and systems.
I've been this year in a project where such an Interface contract hadn't been created (rush rush, save time....) and - oh wonder - there was in the end a lot of rework required with direct impact on time, cost and quality of the final code.
If it's a once off then using Proc Import with guessingrows set to max, or assigning character lengths which are certainly long enough, or eventually even writing a script which first analysis the strings and then generates the lengths for the input statement dynamycally would work as well.
I don't know if any database contains metadata for min/max length of character columns.
So your best bet is to read the whole column and keep score of the maximum length, like
proc sql;
select max(length(name)) from sashelp.class;
quit;
You could, as already suggested, just set the length to a really large number and use compress=yes to deal with the unused space, but if anybody creates a new dataset from that and forgets to add the compress option, they will produce unnecessary large dataset files.
A normal sort might also crack your WORK space, as the utility files of proc sort are not compressed.
Sorting a large compressed dataset is best done with the tagsort option, BTW.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.