@wpliao wrote:
@Tom thanks for your suggestions. I will fix the length afterwards since PROC TRANSPOSE sets the length.
Thanks for the suggestion on the spaces/tabs. I was not aware of that. I have revised my codebase.
Other tips are welcomed. It seems the other changes in your examples The were removing the formating, length, and infile statements. I thought those were good practices?
The VXX variables were actually not in sequence, so I think I can't use the V00-V06 syntax. Please kindly correct me if i'm mistaken.
Attaching FORMATS is only needed when you want the values displayed in special ways. Attaching $xx. formats to character variables can cause problems in the exact situation you were complaining about, the same variable having different lengths in two input datasets.
Consider the case where NAME is length $8 in dataset ONE and has the $8. format attached. If you change the length to $32 without also changing the format then only the first 8 bytes of the name will be displayed.
Or if you SET two dataset together and the first has a variable with a longer length without a format and the second has the same variable with a shorter display format attached then the result is a variable with the longer length (so no data is lost) but its value is displayed truncated. That can lead to much confusion (and possibly wrong analysis results.)
Using an INFILE statement with in-line data is only needed if you want to use some of the options of the INFILE statement. If your in-line data lines are delimited with spaces, have no spaces embedded in the values (or only single spaces and values are delimited by double spaces so that you can use the & modifier in your INPUT statement), and have a value for all variables on every line then there is no need to use the DLM or DSD or TRUNCOVER options of the INFILE statement, so it is not needed.
In general whether the variables are in order in the dataset it does not matter when using a numeric suffixed variable list. The main thing is that all of the implied variables exist (or you want SAS to create them if they don't.)
But are you talking about using the V01-V06 variable list in the INPUT statement? In that case the order does matter. But if you have defined the variables in the dataset in the order that you want to read them from the source text then you could still use a position based variable list (using double dash). Example:
data want;
length first 8 next $10 last 8 ;
infile myfile dsd truncover firstobs=2;
input first -- last ;
run;
... View more