07-13-2017 02:12 PM
I am using SAS 9.4 (English).
I am using proc sql to create a table. Here is my question: Other than type, do the variable attributes (length, format, informat) of the matching variables have to be identical in each dataset? For example, consider the following code:
proc sql; create table data3 as select a.*, b.* from data1 as a left join data2 as b on a.var1 = b.var1 order by var1, year; quit;
Suppose that var1 in data1 and var1 in data2 are both numeric (type) but differ in length, format, and informat. Would I still be able to create a table from data1 and data2? Or would I have to make sure that length, format, and informat are identical as well?
Thanks for the help.
07-13-2017 02:18 PM
This will get you at least a WARNING as you used a.*,b.* and var1 is present in both datasets.
You should clearly state which var1 goes into the output dataset.
If your var1 contains fractions or large enough values, different lengths might lead to problems because of different numeric precision.
07-13-2017 03:06 PM
I appreciate your assistance, Kurt. As you can see in my reply to Schmuel, I see that the lengths of my matching variables are identical. So, there should be no concerns there. Regarding the WARNING, I have received that message in the past. I always ignore it, assuming that, referring to the code that I included in my original post, it will keep the var1 in the "a" dataset (data1). Is that not correct?
07-14-2017 02:05 AM - edited 07-19-2017 07:55 AM
As a rule, I do not allow my code to throw any WARNINGs (let alone ERRORs!) at all; I even make sure that no NOTEs aside from those that indicate which dataset was created (with x variables and y records) appear. That way, anytime something untoward happens, I am immediately alarmed to that fact.
Your "ignored warnings" would prevent your programs from being used in a scheduling environment, as you constantly get non-zero return codes.
This is all the basis for my Maxims 23 and 24, and has served me well for the past decades.
07-13-2017 02:27 PM
Formats and inforrmats do not influence comparison of variables.
Informats define how to interpret input into sas variable, while format defines how to display it.
Numeric type variables comparison is not influenced by length.
Character type variables - comparison will be byte by byte until 1st non blank unequal character.
The short variable will be treated as padded with blanks to the length of the longer variable.
In most cases length do not influence comparison. I'm not sure when comparison done by SQL.
You can make a small test and check it.
07-13-2017 02:49 PM
Thank you for your response, Shmuel. Based on your response, I have a quick question: When using proc sql to create a table, as the long as the type and length of the matching variables are identical, is it safe to assume that there should be no issues? For example, let me refer to the code that I included in my original post:
var1 in data1: type = numeric; length = 8; format = 11.; informat = 11.
var1 in data2: type = numeric; length = 8; format = 6.; informat = 6.
Am I correct in assuming that there should be no issues because var1 in data1 and var1 in data2 have the same type and length.
07-13-2017 03:14 PM
According to the informat I undestand that there are no decimal digits, only integers.
I believe you will have no any issues with comparison.