Reading in Multiple Files from Tenth Row

Reply
New Contributor
Posts: 4

Reading in Multiple Files from Tenth Row

Hi,

 

I'm new to SAS and am having a bit of trouble. Here is my task:

 

I need to input roughly 40 files in one folder that was produced by a program. They are labeled as follows:

 

(Experimental Conditon AB AB+BC ControlAB ControlAB+BC)(Setting 1 2 3 or 4)(Trial Number 0 or 1)

So, for example, a few file names would be:

AB1-0

AB+BC2-1

etc

 

I need to read in all of these files and would like to find a succint way to do this without hardcoding it. 

 

The file structure is also difficult to input. The format is .dat. When opened in excel, the file structure is such that the first 10 rows are comments about the data but the actual variable names start in row 11 (there are 7 variables) and there are 100 rows of data. The size of this dataset is the same across all files. 

 

I have spent two weeks trying to figure this out and just get error every time I try to read in the data. Your help would be very appreciated.

 

Esteemed Advisor
Posts: 7,288

Re: Reading in Multiple Files from Tenth Row

Can you attach one of the files and/or post the code you tried? Sounds like reading the file using a datastep, and starting at record #11, should get by the problem. If the files are all in one directory, and are the only ones there with a .dat extension, you should be able to use a pipe to get the filenames and automate the process.

 

Art, CEO, AnalystFinder.com

 

Grand Advisor
Posts: 17,320

Re: Reading in Multiple Files from Tenth Row

[ Edited ]

1. Use a data step to read in the file

2. Use FIRSTOBS to control the row from which you read the data

3. Once you have it working for a single file, post that code and we can help with automating it for all files. 

 

For example, here's a macro that would do this for multiple Excel Files. 

https://github.com/statgeek/SAS-Tutorials/blob/master/Import_all_files_one_type

 

To modify this for your needs, the %import_file would change to use your code from above. 

Then you could continue with the rest of the program. 

 


marianajane wrote:

Hi,

 

I'm new to SAS and am having a bit of trouble. Here is my task:

 

I need to input roughly 40 files in one folder that was produced by a program. They are labeled as follows:

 

(Experimental Conditon AB AB+BC ControlAB ControlAB+BC)(Setting 1 2 3 or 4)(Trial Number 0 or 1)

So, for example, a few file names would be:

AB1-0

AB+BC2-1

etc

 

I need to read in all of these files and would like to find a succint way to do this without hardcoding it. 

 

The file structure is also difficult to input. The format is .dat. When opened in excel, the file structure is such that the first 10 rows are comments about the data but the actual variable names start in row 11 (there are 7 variables) and there are 100 rows of data. The size of this dataset is the same across all files. 

 

I have spent two weeks trying to figure this out and just get error every time I try to read in the data. Your help would be very appreciated.

 


 

 

Respected Advisor
Posts: 3,825

Re: Reading in Multiple Files from Tenth Row

@marianajane

Assuming your .dat files are text files here some sample code demonstrating how you could approach this.

I've used a comma as delimiter. Change to whatever is appropriate for your real data and if you can't make things work then please attach one of your real files so we can understand what you're dealing with.

%let path=c:\temp;
filename dirlist pipe "dir /b ""&path\*.dat""";

data want;
  length _thisFile $500;
  infile dirlist truncover;
  input _thisFile :$500.;
  _thisFile=cats("&path\",_thisFile);
  
  infile _dummy_ filevar=_thisFile end=last dlm=',' dsd truncover firstobs=12;
  do until(last=1);
    input (var1 var2) (:$2.);
    output;
  end;

run;

 

....oh, and change the file extension .dat to .txt as else this site won't allow you to attach such a file :-( 

 

Capture.PNG

New Contributor
Posts: 4

Re: Reading in Multiple Files from Tenth Row

Thank you everyone for your responses! What an incredibly helpful community!

 

I guess I should take this step-by-step, starting first with simply reading one just one file. But, I'm still struggling a bit in simply even just reading the file. I have posted an example of the file (I've changed the extension to .txt as per Patrick's recommendation)

 

filename first '/folders/myshortcuts/AB+BC1-0.dat';

 

data want;
       infile first firstobs=18;
run;

 

I tried a few different delimeters (comma and tab).

 

I get the error:

 

NOTE: 0 records were read from the infile FIRST.
NOTE: The data set WORK.WANT has 1 observations and 0 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds
 
WORK.WANT CAN NOT BE OPENED BECAUSE IT HAS NO COLUMNS
 
I only really need to flow in the columns WANT1 and WANT2. Eventually, after the reiterative process of flowing in all the files in the given directory, I would want a matrix that looks something like this:
 
WANT1dataset1     WANT2dataset1     WANT1dataset2     WANT2dataset2 etc....
 
All the datasets have the exact same structue as the file attached. The headers of the final matrix would need to be specific enough for me to tell which dataset the values came from. 
 
I hope this makes sense. Thank you very much!
Esteemed Advisor
Posts: 7,288

Re: Reading in Multiple Files from Tenth Row

SAS will only do what you tell it to do. In this case, minimally, you're not telling it to input anything, Additionally, in the case of the file you posted, you have to tell SAS even more, namely that the file is tab delimited, that the records end with a CR (rather than a CR and LF) and that the data really start at line 19 not 18. Try the following:

filename first '/folders/myshortcuts/AB+BC1-0.dat';
 
data want;
  infile first firstobs=19 delimiter='09'x termstr=cr;
  input WANT1	Dar	Ref	Smp	Tx	WANT2	R300;
run;

Art, CEO, AnalystFinder.com

New Contributor
Posts: 4

Re: Reading in Multiple Files from Tenth Row

Thank you! This code worked.

 

I have one issue in that the first WANT1 variable name was changed for this post to higlight the columns I needed, but in the datasets, it is labeled as "Distance(nm)". SAS does not like this and gives me all "." in my matrix. This is OK because these values are the same across all datasets, so I can do this manually, but any advice about importing this variable as well would be helpful.

 

I have also been referencing the options to automate this process and correctly label WANT2 as associated with a particular dataset (AB+BC1-0 in this case). I assume this would look something like rename WANT2 = WANT2'_thisFile_'?

 

I'm trying to get the pipe to work now. I tried using the code above, but I keep getting the error:

ERROR: Insufficient authorization to access PIPE.
ERROR: Error in the FILENAME statement.
 
The path for the first file is as follows:
'/folders/myshortcuts/ExperimentFiles/AB+BC1-0.dat'
This is what I have: 
 
%let pat'/folders/myshortcuts/ExperimentFiles'; <-?
filename dirlist pipe "dir /b ""&path\*.dat""";

data want;
     length _thisFile $500;
     infile dirlist truncover;
     input _thisFile :$500.;
     _thisFile=cats("&path\",_thisFile);

     infile _dummy_ filevar=_thisFile end=last dlm='09'x dsd truncover termstr=cr firstobs=19;

     do until(last=1);
          input (WANT1 Dar Ref Smp Tx WANT2 R) (:$2.);
          keep WANT2;
         rename WANT2 = WANT2'_thisFile_'; <-?
     output;
     end;
run;
Esteemed Advisor
Posts: 7,288

Re: Reading in Multiple Files from Tenth Row

Are you possibly trying to run your code on SAS UniversityEdition? If so, it doesn't allow one to use pipes.

 

As for the variable name, SAS variables have to start with either a letter or underscore, and can only contain a combination of letters, numbers and underscores. Since you're declaring the names in an input statement, you can meet those conditions regardless of how the variable names are declared in the files (since you're skipping the line that contains them).

 

Art, CEO, AnalystFinder.com

 

Grand Advisor
Posts: 10,204

Re: Reading in Multiple Files from Tenth Row

rename WANT2 = WANT2'_thisFile_'; <-?

 

You will not be able to rename a variable with another variable as part of the new name. Simply not allowed in the syntax, mostly because the BASE variable WANT is the same one and RENAME affects the base variable. If you have multiple renames for the same variable only one will be applied.

 

Actually naming a new variable using variable text is not going to happen very easily.

Most of time processing BY _thisfile would accomplish most of the needed functionality of creating several hundred WANT2 variables.

Grand Advisor
Posts: 17,320

Re: Reading in Multiple Files from Tenth Row

See the code in the link I initially posted (on GitHub) for code that doesn't use pipes but will get you the list of files. 

New Contributor
Posts: 4

Re: Reading in Multiple Files from Tenth Row

Thank you everyone for your help! I realized the limitations of University Edition so I've tried running the same program on the full edition of SAS and it worked. I really appreciate it. 

Super User
Super User
Posts: 6,317

Re: Reading in Multiple Files from Tenth Row

You should be able to read all of the files at once into a single dataset.

Here is the header from the example file you posted.

 

Experiment Readings						
1/1/17 0:00						
AB+BC1-0						
						
						
Acq Mode	Absorbance					
A	65					
B	4					
C	3					
D	3100					
Efg	0					
						
average 	5.882					
Hij	0.363					
Klm	0.272					
Nop	-0.227					
						
WANT1	Dar	Ref	Smp	Tx	WANT2	R
300	955.6786	1349.0714	1122.6786	41.2929	0.3841	0.3841
300.37	950.8929	1352.2142	1124.7142	42.1942	0.3747	0.3747
300.74	946.8929	1352.75	1131.3928	44.3959	0.3527	0.3527
301.11	949.1429	1356.4642	1132.4642	43.9383	0.3572	0.3572

Looks like the first line is a constant text, the second has the runtime and the third has the description.

So we should be able to read all of the files in one data step and just look for that header line to know when a new one starts.

If your file is really using CR only as the end of file marker then update the TERMSTR= option on the INFILE statement.

%let path=your directory name here;
data want ;
  infile "&path/*.dat" dsd dlm='09'x truncover termstr=CRLF;
  input @;
  if _infile_=:'Experiment Readings' then do;
    input / rundate :anydtdtm32. / runname :$32. /////////////// ;
  end;
  retain rundate runname ;
  format rundate datetime20.;
  input WANT1 Dar Ref Smp Tx WANT2 R;
run;

 

Ask a Question
Discussion stats
  • 11 replies
  • 175 views
  • 3 likes
  • 6 in conversation