01-18-2016 12:39 AM
When I use 'input' to import some variables from txt into SAS, can I omit some variables?
For example, Raw date:
Variable: A B C D E F .... ZZ
obs: 1 2 3 5 2 3 ... 4
2 4 4 4 5 2 6
I just want to import variable 'A' 'D' 'ZZ' into SAS.
infile 'C:\xxx' dlm='09'x TRUNCOVER DSD lrecl=32767 firstobs=2;
input ???; How to code here? It seems that I have to input all variables following the order in the raw data.
01-18-2016 01:56 AM
Write a complete input statement that takes care of all columns; this will make it easier to maintain the code in the future, as your input corresponds column to column to the file specification.
Use the keep statement to only include the columns you want in the output table.
01-18-2016 02:55 AM
You could try proc import or use a column pointer, if the columns of your text file remain the same:
Filename InTxt 'C:\xxx.TXT'; Data aaa; Infile Intxt Firstobs=2 Delimiter='09'x; Informat A 8. D 8. ZZ 8.; * format correct?; Input A +3 D +10 ZZ; * pointer numbers need to be adjusted; Run;
01-18-2016 07:49 AM - edited 01-18-2016 07:50 AM
In the past I've mostly used dummy variables (e.g. character, length $1, as they can deal with both numeric and character values) to temporarily capture the unwanted columns to be actually skipped when reading unaligned, delimited raw data:
input a dummy dummy d dummy1-dummy11 p dummy1-dummy8 y; drop dummy:;
But maybe it's more elegant to scan the raw data records (if you know for sure the "column" numbers [in the sense of a delimited file] of the variables of interest):
input; A = input(scan(_infile_, 1, '09'x, 'm'), 8.); /* Please use */ D = input(scan(_infile_, 4, '09'x, 'm'), 8.); /* appropriate */ ZZ = input(scan(_infile_, 702, '09'x, 'm'), 8.); /* informats. */
Please note the use of the 'm' modifier (see documentation of the SCAN function).
01-19-2016 11:18 AM
There are two situations that will usually do what you want correctly.
The first is if the data is fixed column, meaning the values you want ALWAYS occur in the exact same columns (can be padded at the end with blanks without harm. This is very old school but the input statement is easy to write:
input var1 1-10 var2 21-27 var3 26-38; /* the overlap in var2 and var3 is intentional to demonstrate how FIXED column */
The other is NAMED input. Named input has the name of the variable followed by its value:
var3=John Smith var1 = something else var6=My favaorite song var10=dog is named Spot
and so on.
Since every value is preceeded by the name then your input statement would look like to just read 3 when more than 3 named variables are present.
input firstname= lastname= value=;
You would likely want to have informats on how to read the values