@allywang wrote:
I am trying to concatenate around 200+ .txt files into a single data set where they are stacked on top of each other and also insert column headers/variable names. There are no column names in the data as its a .txt file of tabulated data but all should follow a ordered format which aligns with column headers that I have.[ all my variables in after input@ are in this order] I also want the file name to be another variable column header in the data set. I know there was a post about appending a .txt folder similar to this but for some reason my code is not producing the right results? I opened the created data set and got "."in all my fields. Does this have something to do with informat/format use? Also it seems there is a length cap on variable names, is the format simply $20. to extend the width? [I am sorry, I am a new SAS user and trying to wrap my head around the nuances so forgive me if I am making big simple mistakes]
I have pasted my code below. I have redacted my file and variable names
libname peer "pathname";
data name; length filename $600; infile "pathname" eov=eov filename=filename ENCODING="WLATIN1" dsd dlm='09'x truncover LRECL=32767; input@ Variable 1 Variable 2: $20. 83 more variables
Run;
The example txt file you posted does NOT use '09'x (tab) as delimiter. It is using the '|' character. Something similar to
data name;
length filename $600;
infile "pathname\*.txt" eov=eov filename=filename ENCODING="WLATIN1" dsd dlm='|' truncover LRECL=32767;
input
Variable1
Variable2 : $20.
/* other variables*/
; /* don't forget to end the input*/
Run;
Has a slightly better chance of running.
You variable names cannot have spaces in them. As previously mentioned if you have documentation to set the informat and variable names that mean something you would be a tad ahead.
When you get this working at a minimum you may want to consider, based on your data description, using some custom informats. You have a field that has ranges such as 65-74 (looking amazing like 10 year age intervals) that include 9999, which isn't. If the 9999 represents missing data you should consider reading that as missing value. Similar with the NA values. You have at least one field that mixes NA with spaces, so that might require some consideration as to coding
... View more