BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LisaD
Fluorite | Level 6

When entering INPUT statements for non SAS datasets and you want to specify the length of a character variable, is there a way to determine the longest length of a variable in a large dataset? To be clcear, pretend that I have a variable for Address and I want to allocate enough spaces for the length of that variable but am not sure of the length the longest address in my data set because it is a large dataset. Is there a command that will allow you to identify the length of the longest Address variable so you don't inadvertantly cut off an address?

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

I don't know if any database contains metadata for min/max length of character columns.

So your best bet is to read the whole column and keep score of the maximum length, like

proc sql;
select max(length(name)) from sashelp.class;
quit;

You could, as already suggested, just set the length to a really large number and use compress=yes to deal with the unused space, but if anybody creates a new dataset from that and forgets to add the compress option, they will produce unnecessary large dataset files.

A normal sort might also crack your WORK space, as the utility files of proc sort are not compressed.

Sorting a large compressed dataset is best done with the tagsort option, BTW.

View solution in original post

10 REPLIES 10
data_null__
Jade | Level 19

When you say from a data set I assume you mean from a data file.  There is not way to know the length without measuring.  PROC IMPORT may be able to help with this if you set guessing rows large enough.  Otherwise you can define all the character variable with very long say $512 and then measure and redefine the length and create new data with proper.  SAS a some code for that I will search you can too.

LisaD
Fluorite | Level 6
Thank you data_null__!
If you find anything further, an entire classroom of graduate students will benefit from your findings. 🙂
TomKari
Onyx | Level 15

How do you know that you've reached the end of any given address?

 

Tom

LisaD
Fluorite | Level 6
Great question TomKari!
The data might be set up:
With one record on multiple lines, in which case the address is complete at the end of a line
With a delimiter of some sort, in which case the delimiter would be the end
In column format, in which case, the address would begin or end at a particular column. In this case, I would like use the position of the following column to tell me how long the Address column should be
But if I allocate $35. for an address in a dataset that has thousands of records, I might not see that there is an address that needs $40. for the length so it’s not cut off.
Do you just make up a super high number like $65. to be sure you don’t have a problem?
RW9
Diamond | Level 26 RW9
Diamond | Level 26

You have some good advice above on the techincal aspects of this.  Me personally, this exemplifies the need for an agreement between vendor and recipient in a data transfer.  A clearly written Data Transfer Specification which details how the data is sent, what the data structure is, plus any other necessary information, signed off by both parties is the only way to acurately transfer data from one party to another.  Sure, you could guess the maximum length, however next data transfer that could change, or maybe they drop that variable, or change it to number, or move it etc.  Documentation is more important than anything else on any process, document, test, then do the process.  I can't count the number of times I have seen programmers having to re-write code each time data is recieved, or having to write test scripts on the data etc. purely because they don't have that control over it.  

Patrick
Opal | Level 21

As @RW9 writes - and I coudn't agree more - and Interface contract is a MUST and anything else is simply unprofessional when it comes to regular data exchange between applications and systems.

I've been this year in a project where such an Interface contract hadn't been created (rush rush, save time....) and - oh wonder - there was in the end a lot of rework required with direct impact on time, cost and quality of the final code.

 

If it's a once off then using Proc Import with guessingrows set to max, or assigning character lengths which are certainly long enough, or eventually even writing a script which first analysis the strings and then generates the lengths for the input statement dynamycally would work as well.

 

 

Peter_C
Rhodochrosite | Level 12
Use the column length defined on the source system and SAS dataset option COMPRESS= YES
Kurt_Bremser
Super User

I don't know if any database contains metadata for min/max length of character columns.

So your best bet is to read the whole column and keep score of the maximum length, like

proc sql;
select max(length(name)) from sashelp.class;
quit;

You could, as already suggested, just set the length to a really large number and use compress=yes to deal with the unused space, but if anybody creates a new dataset from that and forgets to add the compress option, they will produce unnecessary large dataset files.

A normal sort might also crack your WORK space, as the utility files of proc sort are not compressed.

Sorting a large compressed dataset is best done with the tagsort option, BTW.

Peter_C
Rhodochrosite | Level 12
I would have concern for the future lengths.
Today's max.length might not support the data arrriving next.
Of greater interest would be the shape of the distribution of length frequency.
Select length( column) as len, count(*) as freq
From source.table group by 1
Armed with that info you might judge a length sufficient for current max plus a margin for future growth. You might also need to warn the data supplier that their data appears to be truncated in some cases.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 15316 views
  • 14 likes
  • 7 in conversation