Desktop productivity for business analysts and programmers

Macro to scan, report & import non-standard csv files in 1 dataset in SAS egp (v7.13)?

Reply
New Contributor
Posts: 2

Macro to scan, report & import non-standard csv files in 1 dataset in SAS egp (v7.13)?

[ Edited ]

Hello,

 

There are many posts on import csv files into 1 dataset but none that is generic or universal enough to fit my needs, unfortunately. I tried all the below but each time I get error messages and give up.  

 

https://blogs.sas.com/content/sastraining/2013/09/18/importing-csv-files-into-sas-datasets/

https://communities.sas.com/t5/General-SAS-Programming/Importing-several-files-into-SAS/m-p/241827/h...

https://blogs.sas.com/content/sasdummy/2012/03/23/improving-on-a-sas-programming-pattern/

https://communities.sas.com/t5/Base-SAS-Programming/Importing-a-a-csv-file-to-a-SAS-dataset/m-p/4362...

 

I am looking fo the "ideal generic" macro or code that would:

1. scan all the csv files with non-standard SAS names - like 'Batch A (1-5000)' or '1.SOURCE_DATA CTRY1'
but all the csv files have the same format with in each a first line header but not the same number of rows (25'000 to 50'000 rows). Lot of files to upload.

2. from a specific folder not following SAS naming convention - like C:\Business Analysis\1.Data to upload

3. provide a report or a file on all those file names to upload and the number of rows in each - so that we check quality afterwards

4.Import into 1 dataset for the moment on temporary file (to test) but to become a permanent file  - example Library WORK.ALL_DATA

5. Append all data files in that unique ALL_DATA dataset

6. Report on all the data uploaded to be able to check

 

Is that feasible so that it can be used later by all of us? Here attached sample data for convenience.

Thanks a lot for your help on this matter as it could be a bit challenging beacause I am using SAS egp V7.3.

 

Best regards,

a dummy user ;-)

Super User
Posts: 24,003

Re: Macro to scan, report & import non-standard csv files in 1 dataset in SAS egp (v7.13)?

[ Edited ]

Your CSV seems pretty standard. What issue are you having importing the files?

 

https://communities.sas.com/t5/SAS-Communities-Library/How-do-I-write-a-macro-to-import-multiple-tex...

 


@CelineBE wrote:

Hello,

 

There are many posts on import csv files into 1 dataset but none that is generic or universal enough to fit my needs, unfortunately. I tried all the below but each time I get error messages and give up.  

 

https://blogs.sas.com/content/sastraining/2013/09/18/importing-csv-files-into-sas-datasets/

https://communities.sas.com/t5/General-SAS-Programming/Importing-several-files-into-SAS/m-p/241827/h...

https://blogs.sas.com/content/sasdummy/2012/03/23/improving-on-a-sas-programming-pattern/

https://communities.sas.com/t5/Base-SAS-Programming/Importing-a-a-csv-file-to-a-SAS-dataset/m-p/4362...

 

I am looking fo the "ideal generic" macro or code that would:

1. scan all the csv files with non-standard SAS names - like 'Batch A (1-5000)' or '1.SOURCE_DATA CTRY1'
but all the csv files have the same format with in each a first line header but not the same number of rows (25'000 to 50'000 rows). Lot of files to upload.

2. from a specific folder not following SAS naming convention - like C:\Business Analysis\1.Data to upload

3. provide a report or a file on all those file names to upload and the number of rows in each - so that we check quality afterwards

4.Import into 1 dataset for the moment on temporary file (to test) but to become a permanent file  - example Library WORK.ALL_DATA

5. Append all data files in that unique ALL_DATA dataset

6. Report on all the data uploaded to be able to check

 

Is that feasible so that it can be used later by all of us? Here attached sample data for convenience.

Thanks a lot for your help on this matter as it could be a bit challenging beacause I am using SAS egp V7.3.

 

Best regards,

Céline goldbaum, a dummy user ;-)


 

New Contributor
Posts: 2

Re: Macro to scan, report

[ Edited ]

Hello,

I would like something up-most automated Where I do not have to enter manually files names, format fields, ... to ease the use. I tried your code but there are missing elements that I need to enter and I do not know them.
As said, these are sample files that do not reflect my real files. I have 59 fields in the file. ;-)

Hope this help you.

Thanks,
Best regards,

Super User
Posts: 10,570

Re: Macro to scan, report

You can automate the import of one file by using proc import (thereby avoiding writing a datastep), or the collecting of all filenames, which requires that you use a data step for the read that you provide. Proc import can only work (reasonably) on one file at a time.

You can use a compound filename:

filename in ('$HOME/sascommunity/test.csv' '$HOME/sascommunity/test1.csv');

proc import datafile=in out=work.test dbms=csv replace;
run;

but the import will have problems because of the header lines in the second (and so on) file(s).

If you use proc import separately on each file, you'll have problems concatenating the datasets because of different column attributes caused by different contents.

 

Bottom line: roll your own data step according to the file specification, and some code to gather all the filenames. You can read all files in one data step by using an automatically created compund filename, but you need to take care of the header lines in the subsequent files (firstobs= works only on the first)

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 24,003

Re: Macro to scan, report


@CelineBE wrote:

Hello,

I would like something up-most automated Where I do not have to enter manually files names, format fields, ... to ease the use. I tried your code but there are missing elements that I need to enter and I do not know them.
As said, these are sample files that do not reflect my real files. I have 59 fields in the file. ;-)

Hope this help you.

Thanks,
Best regards,


When you let the computer guess, 'automated', it has a tendency to guess wrong. Especially for messy CSV files. So adding in your headers and fields once saves you from a massive clean up process. And you have no guarantee it will read each file the same way so you'll also have that issue when it comes time to append. Basically you can do it 'automated' if desired, but in the long run it takes longer. 

 

This is set up to import all XLSX file types but you can modify it for CSVs.

 

https://github.com/statgeek/SAS-Tutorials/blob/master/Import_all_files_one_type

 

 

 

 

Super User
Super User
Posts: 9,840

Re: Macro to scan, report & import non-standard csv files in 1 dataset in SAS egp (v7.13)?

You have lots of questions and components here.  This is a Q&A forum, we aren't going to do it for you.

1) Several ways, simplest being, if you have access to command line (this gets all csv filename in a directory:

filename tmp pipe 'dir "c:/temp/*.csv" /b';

data want;
   infile tmp;
   length line $2000;
   input line $;
run;

2) No idea what you mean here?

3) You would import the data to find out number of rows, once it is in SAS there are numerous methods to get nobs, sashelp.vtable for instance.

4) This is a mapping excercise, if your data all looks the same, then you might get away with just wildcards:

data all_data;
  infile "c:/temp/*.csv";
  input...
run;

5) No idea what you mean?

6) Again, no information, do you want a print print of the final dataset?

 

I would start by throughly documenting your whole process from start to end, define all the inputs, outputs, then block in sections for mapping, validation steps etc.

Super User
Super User
Posts: 8,278

Re: Macro to scan, report & import non-standard csv files in 1 dataset in SAS egp (v7.13)?

[ Edited ]

It sounds like your problem is easier than those in the linked threads.

You said that all of the files have the same format (layout).  In that case you can create a single data step that reads all of the files into a single dataset.

 

If they are all on the same directory and you can use simple * wildcard to find them all then just use the wildcard in the INPUT statement.  What you need to do is to is make sure to test when the actual name changes to skip the header row.

 

Something like:

data want ;
  length _filename filename $200 ;
  infile 'xxx*.txt'  dsd truncover filename=_filename  ;
  input @;
  filename = _filename ;
  if _filename ne lag(_filename) then input;

  length a b c 8 ;
  input a -- c ;
run;

Otherwise build up a list of filenames in a dataset and then use that dataset to drive the looping in the data step that reads the files.  Skip the header row and then read/write the observations until you reach the end of the current file.

 

Something like this:

data filelist ;
  infile 'ls xxx*.txt' trunover;
  input filename $200. ;
run;

data want ;
  set filelist ;
  _filename = filename ;
  infile csv filevar=_filename dsd truncover end=eof;

  length a b c 8 ;
  if not eof then input ;
  do while (not eof);
    input a -- c ;
    output;
  end;
run;

 

 

Ask a Question
Discussion stats
  • 6 replies
  • 97 views
  • 0 likes
  • 5 in conversation