Hello,
I have hundreds .txt files in the following data structure:
"date, (mm/dd/yyyy)", "weigh, (lb)", "height, (cm)",
"sys blood pressure", "dia blood pressure", "heart rate", "temperature"
"01/10/2011", "140", "170", "120", "85", "90", "99.2"
"01/12/2011", "139", "170", "126", "90", "96", "98.9"
............................................................................
.............................................................................
The first 2 rows (I have 5 rows in my files but just listed 2 here for simplicity) are the variable names. Each .txt file contains a subject's yearly info., and the files name is the subject's id (for exmaple, 001002).
I need to read the hundreds of .txt files in to one SAS file and have the file names (id) into one variable (id) in the data set. Can anyone help?
Thanks in advance,
You can use wildcards in your infile statement, so get the code to read it for one file and then use something like the following:
data input;
infile 'C:\this is my path\*.txt' filename=source;
input blah blah blah;
ptid=source;
run;
This topic has been sufficiently discussed many times, a search will show you a lot more.
Below is the code I have tried long time ago to file my downloaded files:
filename fn pipe 'dir h:\temp\*.txt /s /b';
data want;
infile fn truncover;
input name:&$100.;
infile in filevar=name end=last truncover firstobs=6;
do until(last);
input x$10.;
output;
end;
run;
Note: 1. change the folder to wherever your files locate. 2. firstobs= is to set the starting rows. Since you mentioned you have 5 rows of variable names before the data, I put it '6'. 3. input x$10. need to be replaced by your real variable names.
Haikuo
I am working in under Unix, so the pipe '........../s /b' seems not working. I will try to find what the commands are for Unix.
But regarding the input statement, since as I mentioned earlier, I have first 5 rows which are about 50 variables in each file, is there a way i don't need to manually type them in?
Thanks
For the pipe, try 'ls /unix/dir/*.txt'.
Thanks SAS_Bigot, the code worked.
Anyone know how could I have the .txt file name (subject id) convert into one variable (ID) in the final SAS data set
Reeza's earlier reply uses the filename= option on the infile statement to capture the name of the file that is currently being processed by the infile statement. However, you cannot assign source directly to ptid as in the example because the value of source will contain the file extention (.txt) and may also contain the full path (directories) of the file. So I modified the code to strip the leading path and the trailing extension and leave the subject id:
patid = scan( scan( source, -1, '/' ), 1, '.' );
Thanks SAS_Bigot, again the code worked.
Hi Haikuo,
Yes, all my variables have the same informat. And I just realized that the labels (variable names) are in the first row not the first 5 rows after I read the data into SAS file.
The labels look like this (these are not English) and comma divided. Do you have any suggestions?
"Naam","Bloeddruk, voor/na, Resultaat (Voor )","Bloeddruk, voor/na, Resultaat (Na )","Gewicht, voor/na (Voor )","Gewicht, voor/na (Na )","Geslacht","Dialysaatflow (Voorgeschreven )","Diabetes","Lengte patiënt","Totaal UF volume (vocht verlies)","Type behandeling (Voorgeschreven )","Kunstnier (Voorgeschreven )","Toegang tot de bloedbaan (Voorgeschreven )","Streefgewicht (Voorgeschreven )","Natrium in dialysaat (Voorgeschreven )","NeoRecormon, Sterkte (Voorgeschreven )","eprex (Levering via AZM), Sterkte (Voorgeschreven )","Mircera, Sterkte (Voorgeschreven )","aranesp, Sterkte (Voorgeschreven )","Behandelingsschema, Shift (Voorgeschreven )","Bloedflow, gemiddeld (Voorgeschreven )"
another question, when I use input statement to read in the following lines: input v1$ v2$ v3$...........
"18.04.08","","","","","","","","","","","","","","","","","","60 mcg","",""
"11.07.08","","","","","","500","Nee","","","Haemodialyse","F8 HPS Hemoflow (thuisdialyse)","","","","","","","","","350"
the "Haemodialyse" and "F8 HPS Hemoflow (thuisdialyse) got tuncated. Is there a infile option statement can take care of it without having informat statement?
My typical suggestion is to use the GUI import procedure from SAS once, copy the code from the log and play with it :smileysilly:
Usually gets me what I want.
Without specifying the length of a variable beforehand or using an informat on the input statement, SAS assumes the variable will be a length of 8. You can modify your input statement for those 2 variables to accommodate longer text strings such as: v11 :$30. v12 :$30. This tells SAS to read up to 30 characters for variables v11 and v12. You'll notice that SAS assigns a length of 30 for those 2 variables.
is there a way i don't need to manually type them in?
Not in your case. if all of your variables have the same informat, you could first read in the first 5 rows into macro a variable (sometimes involving other manipulations, such as dequote, de-comma etc) , then use it in downstream data steps.
Haikuo
This thread was very helpful in solving my problem today. Thank you for sharing your code, Haikuo!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.