BookmarkSubscribeRSS Feed
allywang
Fluorite | Level 6

I am trying to concatenate around 200+ .txt files into a single data set where they are stacked on top of each other and also insert column headers/variable names. There are no column names in the data as its a .txt file of tabulated data but all should follow a ordered format which aligns with column headers that I have.[ all my variables in after input@ are in this order] I also want the file name to be another variable column header in the data set. I know there was a post about appending a .txt folder similar to this but for some reason my code is not producing the right results? I opened the created data set and got "."in all my fields. Does this have something to do with informat/format use?  Also it seems there is a length cap on variable names, is the format simply $20. to extend the width? [I am sorry, I am a new SAS user and trying to wrap my head around the nuances so forgive me if I am making big simple mistakes] 

 

I have pasted my code below. I have redacted my file and variable names 

 

libname peer "pathname";

data name;
length filename $600;
infile "pathname" eov=eov filename=filename ENCODING="WLATIN1" dsd dlm='09'x truncover LRECL=32767;
input@
Variable 1
Variable 2: $20.
83 more variables 


Run;


13 REPLIES 13
heffo
Pyrite | Level 9

Hi, there are some issues here with your code. But first lets talk about the variables. The variables can be either numeric or strings and SAS will assume numeric unless you tell it otherwise. So, all your "." is because SAS is trying to read a text but store it as a number and that wont work. The text being letters, not numbers stored as text. 

 

I would first tell SAS what variables we have and their type plus lenght. then read the data and tell SAS where to store the values. 

data name; 
*The order you want the variables in the table;
 length var1 $ 600 var2 8 nextVar $ 32;  *var1 is a string ($) with maximum 600 bytes long, var2 is a numeric stored in 8 bytes (16 decimals) and nexVar is a string 32 chars.; 
*Caviat, 32 bytes, not chars but in your case it should be the same as number of chars.;
infile "pathname" eov=eov filename=filename ENCODING="WLATIN1" dsd dlm='09'x truncover LRECL=32767; 
input nextVar  var1 var2 ; *The order the data is in the text file.;
Run;
Tom
Super User Tom
Super User

You have two problems. 

Let's answer the one in your subject line first.  If you have multiple files in the same directory with similar names such that you can replace part of the name with a wildcard and find all of the files (and only those files) then the INPUT statement will take care of reading multiple files.  Especially as it sounds like you are saying the individual files do NOT have header lines, just lines of data.

For example something like this:

infile '/mydirectory/basename*.txt'  dsd dlm='09'x truncover;

Your second problem is understanding what a dataset is, what a variable is, how to define variables in SAS and how to use the INPUT statement to read values from a text file.  A dataset in SAS contains variables. Each variable is either a fixed length character string or a floating point number.   Unlike a spreadsheet (or a text file) every observation of the dataset has the exact same variables, just a different set of values for those variables. 

 

In general to read from a delimited file (your question makes it sound like your source files are tab delimited text files) you just need to use a simple data step like this:

data want;
  length var1 8 var2 $20 .... varlast 8 ;
  infile ... ;
  input var1 -- varlast;
run;

So basically define the dataset you are creating, then define the variables in the dataset, define where you are reading from, and then read the values.  If you define the variables in the order they appear in the text file then the input statement can be as simple as this example where it is using the double hyphen to specify a variable list based on the variables position in the dataset.

If you have variables where reading the value from text requires a special effort (like DATE, TIME or DATETIME values) then you can add a INFORMAT statement to associate a special informat with those variables.  SAS can read most numbers and character strings without being told to use a special informat.  If the values need a special format attached to make the values understandable (or perhaps just consistent) when convert to strings for display then you can attach formats with a FORMAT statement. Again SAS knows how to display normal numbers and character strings so only things like DATE, TIME or DATETIME values will require attaching a format. 

 

Now if your problem is more complicated you might want to use the FILENAME= option or perhaps the FILEVAR= option on the INFILE statement.  But you will need to describe more details of your actual problem to know if either of those are needed.

 

 

 

Kurt_Bremser
Super User

Since you're new to SAS, have you already worked through the free online Programming 1 course? If not, I strongly recommend to do this, as it will be hard to otherwise learn all the SAS basics by trial-and-error and getting help here.

Reeza
Super User
Start with a single file. Once you have that file read in correctly, then post back with that code and we can help you generalize it.
allywang
Fluorite | Level 6

Thank you all for your help- I was thrown to the wolves a bit and have to learn on the fly so I am planning to take the online course on my own time. As for the Code I have attached my single file one below- I ran it and for some reason it still is coming back as "." Also note: my variables that are not numeric or alphanumeric- I am confused on how I have to denote this? Also I am trying to get all the file names into one column once it comes from a folder- i assumed the retain variable one does that 

 

data name; 
length; 
filename $600
Variable2 3
Variable3 $20 
..... 
*the $ are for my alphanumeric variables and this goes on for 85 more variables 
Retain Variable 1; 
infile "filepath" eov=eov filename=filename ENCODING= "WLATIN1" dsd dlm='09'x Missover LRECL=32767; 
input

filename $
Variable2 3
Variable3 $
..... 
run; 

 

 

allywang
Fluorite | Level 6

I'm sorry I meant put the code in sas below- can you see it in my former reply? 

 

Reeza
Super User
Can you attach a small example of the file? If you can't attach the real one, take the original make a cop and keep a few lines of data. Then make the data fake/nonsense and post that.
allywang
Fluorite | Level 6

I can attach this file- which is in the same format as the others. The single file was too large to attach on its own so I cut it down. Also I am able to input column names but not values as a side note

ballardw
Super User

@allywang wrote:

I'm sorry I meant put the code in sas below- can you see it in my former reply? 

 


If your data is not particularly sensitive then copy a few rows, 5 to 10 from your text file that has been opened in a plain text editor. In this forum open a code box using the {I} icon and paste the text. That is the best way to share raw text data.

If you have some sensitive data use your plain text editor (Notepad, the SAS editor, what have you ) and type over the sensitive values with a different character to mask the value, then copy and paste into the cod box.

Similar post the entire program text in a different code box.

Example in your data:

John Smith

Type over to create:

XXXX XXXXX

Do not paste data directly into the message windows as the forum software will reformat things like blanks and potentially change some characters like tabs to spaces.

Tom
Super User Tom
Super User

If you are using the FILENAME= option on the INFILE statement to indicate which variable SAS should store the name of the file currently being read then do NOT also include that variable in the INPUT statement.  If the data file(s) actually include values that you want to read into the variable FILENAME then do not include the FILENAME= option on the INFILE statement, Or use a different variable name.

 

If you have defined the variables already there is no need to add the $'s into the INPUT statement.  That is only needed if the INPUT statement is the first place you have used the variable name since without the $ SAS define the new variable as numeric.  It is not needed when the variable is already defined.

 

You probably did not intend to set the value of VARIABLE2 to just the value of the third character on the line.  If you data is delimited by tabs then you want to use list mode input style instead of telling SAS which character positions on the line to read.

 

Perhaps you meant to use the 3. informat instead of telling SAS to read from column 3?  But if your file is delimited then you do not want to use formatted style input either.  Just list mode.  You do NOT need to add format specifications in the INPUT statement, unless the variable is one that requires it.  Like a DATE, TIME or DATETIME value.  And in that case with use the INFORMAT statement to tell SAS what informat to use. Or if you do include an informat specification in the INPUT statement then add a : (colon) before the informat so that the input statement will still honor the delimiters in the line.

heffo
Pyrite | Level 9

Start small and add things when it works. So, the first version could be this:

data name; 
length 
variable1 $600
Variable2 3
Variable3 $20 
;
*Just the first couple of variables to see that the code work.
You had stopped the length statement to early with the semi colon.; infile "filepath" dsd dlm='09'x Missover LRECL=32767; input variable1 Variable2 Variable3 ; *No need to define the type again in the input statement, you already done that in the length statement.; run;
ballardw
Super User

@allywang wrote:

I am trying to concatenate around 200+ .txt files into a single data set where they are stacked on top of each other and also insert column headers/variable names. There are no column names in the data as its a .txt file of tabulated data but all should follow a ordered format which aligns with column headers that I have.[ all my variables in after input@ are in this order] I also want the file name to be another variable column header in the data set. I know there was a post about appending a .txt folder similar to this but for some reason my code is not producing the right results? I opened the created data set and got "."in all my fields. Does this have something to do with informat/format use?  Also it seems there is a length cap on variable names, is the format simply $20. to extend the width? [I am sorry, I am a new SAS user and trying to wrap my head around the nuances so forgive me if I am making big simple mistakes] 

 

I have pasted my code below. I have redacted my file and variable names 

 

libname peer "pathname";

data name;
length filename $600;
infile "pathname" eov=eov filename=filename ENCODING="WLATIN1" dsd dlm='09'x truncover LRECL=32767;
input@
Variable 1
Variable 2: $20.
83 more variables 


Run;



The example txt file you posted does NOT use '09'x (tab) as delimiter. It is using the '|' character. Something similar to

 

data name; 
length filename $600; 
infile "pathname\*.txt" eov=eov filename=filename ENCODING="WLATIN1" dsd dlm='|' truncover LRECL=32767; 
input
Variable1
Variable2 : $20. 
 /* other variables*/
; /* don't forget to end the input*/

Run;

Has a slightly better chance of running.

You variable names cannot have spaces in them. As previously mentioned if you have documentation to set the informat and variable names that mean something you would be a tad ahead.

 

When you get this working at a minimum you may want to consider, based on your data description, using some custom informats. You have a field that has ranges such as 65-74 (looking amazing like 10 year age intervals) that include 9999, which isn't. If the 9999 represents missing data you should consider reading that as missing value. Similar with the NA values. You have at least one field that mixes NA with spaces, so that might require some consideration as to coding

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1841 views
  • 7 likes
  • 6 in conversation