SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Macro for importing multiple txt files with nonsequential names

Accepted Solution Solved
Reply
Contributor
Posts: 40
Accepted Solution

Macro for importing multiple txt files with nonsequential names

Hi guys!

Today I was playing with some data and found out a problem. I have 5 data sets in txt format named roh_1.txt, roh_2.txt, roh_4.txt, roh_8.txt and roh_16.txt. It could also be any other number like _1, _3, _10, _11...
I would like to make macro which will allow me to import those into SAS. I am familiar with infile option so no need for this part. I am posting code for import but this is not an issue. Of course I could rename files to be 1 to 5 but... Well I don't want to. Can you please give some hints.
Thanks

 

data file1 (drop = snp1 snp2 bp_position1 bp_position2 fid pheno nsnp gustp phom phet) ;
length bull_id $15. snp1 $50. snp2 $50. bp_position1 $12. bp_position2 $12.;
infile "C:\Users\MajaZmaja\Desktop\roh_4.txt" delimiter=" " firstobs=2;
input fid bull_id pheno $ chr snp1  snp2  bp_position1  bp_position2  kilobaza nsnp gustp phom phet @@;
run;


Accepted Solutions
Solution
‎09-25-2015 06:23 AM
Super User
Super User
Posts: 7,050

Re: Macro for importing multiple txt files with nonsequential names

Posted in reply to MajaFerencakovic

The basic idea of how to loop over an arbitrary list of values in macro code is store the list of values as a delimited list in a macro variable and then iterate from 1 to the number of items in the list using the %SCAN() function to pick out the next item.

 

%macro do_over(list);
%local i item ;
%do i=1 %to %sysfunc(countw(&list));
   %let item=%scan(&list,&i);
... put the code to replicated here ...
%end;
%mend do_over;

%do_over(a1 a3 a457);

View solution in original post


All Replies
Trusted Advisor
Posts: 1,918

Re: Macro for importing multiple txt files with nonsequential names

Posted in reply to MajaFerencakovic

If the text files are all in one folder, you can try something like this:

 

infile "C:\Users\MajaZmaja\Desktop\*.txt" delimiter=" " firstobs=2;

Contributor
Posts: 40

Re: Macro for importing multiple txt files with nonsequential names

[ Edited ]
Posted in reply to PaigeMiller

Will this approach produce 5 differently named sas data sets? 

 

Ok I will write like this knowing this is not correct

 

data file_&i (drop = snp1 snp2 bp_position1 bp_position2 fid pheno nsnp gustp phom phet) ;
length bull_id $15. snp1 $50. snp2 $50. bp_position1 $12. bp_position2 $12.;
infile "C:\Users\MajaZmaja\Desktop\roh_&i.txt" delimiter=" " firstobs=2;
input fid bull_id $ pheno $ chr snp1 $ snp2 $ bp_position1 $ bp_position2 $ kilobaza nsnp gustp phom phet @@;
run;
Trusted Advisor
Posts: 1,918

Re: Macro for importing multiple txt files with nonsequential names

Posted in reply to MajaFerencakovic

MajaFerencakovic wrote:

Will this approach produce 5 differently named sas data sets? 

No, it does not, it produces one large data set that contains the contents of all 5 input files.

 

However, you can add the FILENAME= option to the INFILE statement, and then the name of the file that you read from is now a variable that you can choose to include in the output data set, and if needed, segregate the 5 input files that way.

 

Example:

 

infile "C:\Users\MajaZmaja\Desktop\*.txt" delimiter=" " firstobs=2 filename=whichfile;

file_used=whichfile;

Contributor
Posts: 40

Re: Macro for importing multiple txt files with nonsequential names

Posted in reply to PaigeMiller

well, it looks like I am not fit for this chalenge. Ok so in file1 i will get variable with path of the file... Then I should again manually extract files out. 

This is not what would be useful here.

 

Thank you for your effort

 

 

Super User
Posts: 19,806

Re: Macro for importing multiple txt files with nonsequential names

[ Edited ]
Posted in reply to MajaFerencakovic

The wild card (*) tells SAS to read all files that fit the condition. 

You'll get all files input into a single SAS data set with a new variable, file_used, which indicates which file the observation came from. If you need to split it out further there are many ways, but are you sure you need to, if they all have the same structure?

 

PS. Please embed your code using the little running man icon at the top of the editor, which will format the code as SAS code. It makes it more readable. 

 

 

Contributor
Posts: 40

Re: Macro for importing multiple txt files with nonsequential names

I am sure I have to split it for further analysis and merging with other files Smiley Happy

Regarding this new variable I only got "C:\Users" under variable  file_used

 

 

Trusted Advisor
Posts: 1,918

Re: Macro for importing multiple txt files with nonsequential names

Posted in reply to MajaFerencakovic

You probably need a statement like this before FILE_USED is defined.

 

LENGTH FILE_USED $ 256;

Contributor
Posts: 40

Re: Macro for importing multiple txt files with nonsequential names

Posted in reply to PaigeMiller

Guys, please, is there a way to do this trough do loop? 

The way you are proposing is fine for putting everything together. 

Somethimes I will need to enter my "resultfile" and I would need files _3, _6, _1002, _34568. Or some other combination from some 40000 result files.  So I would like to write code that will allow me to specify which ones I need and will import them and save them as separate sas data files

Super User
Posts: 19,806

Re: Macro for importing multiple txt files with nonsequential names

Posted in reply to MajaFerencakovic
There are many macro's on here that do that.
If you wanted to limit your analysis in the future you could simply apply a WHERE clause in your analysis, otherwise you're going to be combining data sets. It's an inefficient way to work.
where file_used in (3, 6, 1002, 34568);

That being said, yes, you obviously can write a macro, and that solution is on here many times over, you need to search for it Smiley Happy.

https://communities.sas.com/t5/Base-SAS-Programming/convert-large-number-of-text-files-into-SAS-all-...

You can also search lexjansen.com for many papers written on the topic.
Contributor
Posts: 40

Re: Macro for importing multiple txt files with nonsequential names

Sorry, I am usually finding advises here and they were always helpful, however, I dont get this one at all. Must be that my knowledge (lack of it) is a problem

If I use command like you guys suggested I don't get variable like you said I will, no mather what lenght of variable I define. I don't get the whole path so I guess I can't use WHERE.

I was more thinking on defining numbers I need in

%let something = 1 2 5 10;

%macro somethingelse;

 

like here http://stackoverflow.com/questions/11289027/do-loop-with-unequal-increment

 

However I did not managed this one to work because this count is producing problems.

 

Ideas?

Solution
‎09-25-2015 06:23 AM
Super User
Super User
Posts: 7,050

Re: Macro for importing multiple txt files with nonsequential names

Posted in reply to MajaFerencakovic

The basic idea of how to loop over an arbitrary list of values in macro code is store the list of values as a delimited list in a macro variable and then iterate from 1 to the number of items in the list using the %SCAN() function to pick out the next item.

 

%macro do_over(list);
%local i item ;
%do i=1 %to %sysfunc(countw(&list));
   %let item=%scan(&list,&i);
... put the code to replicated here ...
%end;
%mend do_over;

%do_over(a1 a3 a457);
Contributor
Posts: 40

Re: Macro for importing multiple txt files with nonsequential names

Ok, this works partly. For example I did this and for first two was ok 

%macro do_over(list);
%local i item ;
%do i=1 %to %sysfunc(countw(&list));
   %let item=%scan(&list,&i);
data file_&i (drop = snp1 snp2 bp_position1 bp_position2 fid pheno nsnp gustp phom phet) ;
length bull_id $15. snp1 $50. snp2 $50. bp_position1 $12. bp_position2 $12.;
infile "C:\Users\Maja Zmaja\Desktop\ASD2015\Btauroh_&i" delimiter=" " firstobs=2;
input fid $ bull_id $ pheno $ chr  snp1 $ snp2 $ bp_position1 $ bp_position2 $ kilobaza nsnp gustp phom phet @@;
run;
%end;
%mend do_over;


%do_over(1 2 8);

But then I got this  

 

ERROR: Physical file does not exist, C:\Users\Maja Zmaja\Desktop\ASD2015\Btauroh_3.

And what ever combination I put in list I get the same result. I tried 

%do_over(4 8);

and still obtained data from Btauroh_1 and Btauroh_2 and that was it

 

 

Super User
Super User
Posts: 7,050

Re: Macro for importing multiple txt files with nonsequential names

Posted in reply to MajaFerencakovic
That's easy. You are referencing the index number instead of the value. In your code use &ITEM to get the value of the current item, not &I which will the order of the item in the list.
Contributor
Posts: 40

Re: Macro for importing multiple txt files with nonsequential names

Yes! Thank you!

I was not aware that I did not change the &i to &item.

 

This works great

 

 

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 689 views
  • 0 likes
  • 5 in conversation