05-23-2018 09:17 AM
Hi, I thought this quite a bit but didn't come up with a proper argument against it. So why on earth SAS won't just take the max length of each variable as the new length after a set statement? I guess this must be somehow related to PDV processing of data. But wouldn't it be possible for SAS to run some preliminary checks for the lengths? Now the data needs to be processed multiple times to get the desired outcome, which is really wasting resources A LOT. And for those who just propose me to change the varlength check option, I can already tell that is not an option in my case.
So if someone could explain to me why the things are as they are I would be really glad. This clearly is related to my limited SAS knowledge so I would be educating myself here also. Thanks.
05-23-2018 09:29 AM
The variables included in a dataset that's named in a set statement will keep their lengths, for obvious reasons.
New variables are set to a fitting length wherever this can be determined from the circumstances (length of existing variables to the right of the equal sign, length of literals), or to some specific length as stated in the documentation (eg some functions set the result to $200 if not specified otherwise).
In the case of reading external data, it is the job of the programmer to set the lengths according to the documentation (s)he got along with the file. When reading from other databases, SAS does it's best to accommodate the lengths specified in the DBMS.
05-23-2018 09:31 AM
Surely your functional design specification for the process will incorporate the various data models, and that would highlight various differences and what to do about those? I really don't like the approach of letting software which has very little knowledge of my data make changes (i.e. implicit), whereas me who knows my data and processes fully knows how to do the process (explicit). For instance, there are many ways to set lengths explicitly:
data want; length var1 var2 $100; set have1 have2; run;
proc sql; create table want as select var1 as var1 length $100, var2 as var2 length $100 from have1 union all select * fro have2; quit;
I wouldn't for instance want my ID which is 8 character, being expanded out to a longer string just because some dodgy Excel file has been imported.
Yes, its also likely a restriction of the PDV, as the structure of the PDV is created up front, so it would be a fairly large change to implement this.
05-23-2018 09:44 AM
The SAS data step compiler is not really that smart. It really behaves like a single pass compiler. It does not attempt to gather all of the information and then re-process the statements in light of that gather information.
Basically it sets the definition for a variable as soon as it can. Once it has determined the type/length for a variable it does not change it.
When referencing the same variable from multiple sources the first one to contain the variable set the type/length for the variable.
However for FORMAT/INFORMAT attributes the last one to declare a preference "wins".
If you are working with planned data then make a plan for how the data should be formatted. For example you could just have a file with the LENGTH or ATTRIB statements
data want ; %include 'table_defintions.sas'; merge table1 table2 ; by id; run;
or a template dataset.
data want ; merge template(obs=0) table1 table2 ; by id; run;
If you are working with random or unknown inputs then it is not hard to create a macro that can check the input datasets and generate a consolidated set of variables for you.
05-23-2018 10:19 AM
Suppose you have a process such that a variable at some point must not exceed a specific length, such as to be incorporated into an existing data base.
And you have a data merge with a data set that has that variable exceeding that length.
Do you really want an undocumented process to automatically increase the length of the variable so that the process fails, mysteriously, at a later point? Or results in other unacceptable data?