BookmarkSubscribeRSS Feed
CelineBE
Calcite | Level 5

Hello,

 

There are many posts on import csv files into 1 dataset but none that is generic or universal enough to fit my needs, unfortunately. I tried all the below but each time I get error messages and give up.  

 

https://blogs.sas.com/content/sastraining/2013/09/18/importing-csv-files-into-sas-datasets/

https://communities.sas.com/t5/General-SAS-Programming/Importing-several-files-into-SAS/m-p/241827/h...

https://blogs.sas.com/content/sasdummy/2012/03/23/improving-on-a-sas-programming-pattern/

https://communities.sas.com/t5/Base-SAS-Programming/Importing-a-a-csv-file-to-a-SAS-dataset/m-p/4362...

 

I am looking fo the "ideal generic" macro or code that would:

1. scan all the csv files with non-standard SAS names - like 'Batch A (1-5000)' or '1.SOURCE_DATA CTRY1'
but all the csv files have the same format with in each a first line header but not the same number of rows (25'000 to 50'000 rows). Lot of files to upload.

2. from a specific folder not following SAS naming convention - like C:\Business Analysis\1.Data to upload

3. provide a report or a file on all those file names to upload and the number of rows in each - so that we check quality afterwards

4.Import into 1 dataset for the moment on temporary file (to test) but to become a permanent file  - example Library WORK.ALL_DATA

5. Append all data files in that unique ALL_DATA dataset

6. Report on all the data uploaded to be able to check

 

Is that feasible so that it can be used later by all of us? Here attached sample data for convenience.

Thanks a lot for your help on this matter as it could be a bit challenging beacause I am using SAS egp V7.3.

 

Best regards,

a dummy user 😉

6 REPLIES 6
Reeza
Super User

Your CSV seems pretty standard. What issue are you having importing the files?

 

https://communities.sas.com/t5/SAS-Communities-Library/How-do-I-write-a-macro-to-import-multiple-tex...

 


@CelineBE wrote:

Hello,

 

There are many posts on import csv files into 1 dataset but none that is generic or universal enough to fit my needs, unfortunately. I tried all the below but each time I get error messages and give up.  

 

https://blogs.sas.com/content/sastraining/2013/09/18/importing-csv-files-into-sas-datasets/

https://communities.sas.com/t5/General-SAS-Programming/Importing-several-files-into-SAS/m-p/241827/h...

https://blogs.sas.com/content/sasdummy/2012/03/23/improving-on-a-sas-programming-pattern/

https://communities.sas.com/t5/Base-SAS-Programming/Importing-a-a-csv-file-to-a-SAS-dataset/m-p/4362...

 

I am looking fo the "ideal generic" macro or code that would:

1. scan all the csv files with non-standard SAS names - like 'Batch A (1-5000)' or '1.SOURCE_DATA CTRY1'
but all the csv files have the same format with in each a first line header but not the same number of rows (25'000 to 50'000 rows). Lot of files to upload.

2. from a specific folder not following SAS naming convention - like C:\Business Analysis\1.Data to upload

3. provide a report or a file on all those file names to upload and the number of rows in each - so that we check quality afterwards

4.Import into 1 dataset for the moment on temporary file (to test) but to become a permanent file  - example Library WORK.ALL_DATA

5. Append all data files in that unique ALL_DATA dataset

6. Report on all the data uploaded to be able to check

 

Is that feasible so that it can be used later by all of us? Here attached sample data for convenience.

Thanks a lot for your help on this matter as it could be a bit challenging beacause I am using SAS egp V7.3.

 

Best regards,

Céline goldbaum, a dummy user 😉


 

CelineBE
Calcite | Level 5

Hello,

I would like something up-most automated Where I do not have to enter manually files names, format fields, ... to ease the use. I tried your code but there are missing elements that I need to enter and I do not know them.
As said, these are sample files that do not reflect my real files. I have 59 fields in the file. 😉

Hope this help you.

Thanks,
Best regards,

Kurt_Bremser
Super User

You can automate the import of one file by using proc import (thereby avoiding writing a datastep), or the collecting of all filenames, which requires that you use a data step for the read that you provide. Proc import can only work (reasonably) on one file at a time.

You can use a compound filename:

filename in ('$HOME/sascommunity/test.csv' '$HOME/sascommunity/test1.csv');

proc import datafile=in out=work.test dbms=csv replace;
run;

but the import will have problems because of the header lines in the second (and so on) file(s).

If you use proc import separately on each file, you'll have problems concatenating the datasets because of different column attributes caused by different contents.

 

Bottom line: roll your own data step according to the file specification, and some code to gather all the filenames. You can read all files in one data step by using an automatically created compund filename, but you need to take care of the header lines in the subsequent files (firstobs= works only on the first)

Reeza
Super User

@CelineBE wrote:

Hello,

I would like something up-most automated Where I do not have to enter manually files names, format fields, ... to ease the use. I tried your code but there are missing elements that I need to enter and I do not know them.
As said, these are sample files that do not reflect my real files. I have 59 fields in the file. 😉

Hope this help you.

Thanks,
Best regards,


When you let the computer guess, 'automated', it has a tendency to guess wrong. Especially for messy CSV files. So adding in your headers and fields once saves you from a massive clean up process. And you have no guarantee it will read each file the same way so you'll also have that issue when it comes time to append. Basically you can do it 'automated' if desired, but in the long run it takes longer. 

 

This is set up to import all XLSX file types but you can modify it for CSVs.

 

https://github.com/statgeek/SAS-Tutorials/blob/master/Import_all_files_one_type

 

 

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You have lots of questions and components here.  This is a Q&A forum, we aren't going to do it for you.

1) Several ways, simplest being, if you have access to command line (this gets all csv filename in a directory:

filename tmp pipe 'dir "c:/temp/*.csv" /b';

data want;
   infile tmp;
   length line $2000;
   input line $;
run;

2) No idea what you mean here?

3) You would import the data to find out number of rows, once it is in SAS there are numerous methods to get nobs, sashelp.vtable for instance.

4) This is a mapping excercise, if your data all looks the same, then you might get away with just wildcards:

data all_data;
  infile "c:/temp/*.csv";
  input...
run;

5) No idea what you mean?

6) Again, no information, do you want a print print of the final dataset?

 

I would start by throughly documenting your whole process from start to end, define all the inputs, outputs, then block in sections for mapping, validation steps etc.

Tom
Super User Tom
Super User

It sounds like your problem is easier than those in the linked threads.

You said that all of the files have the same format (layout).  In that case you can create a single data step that reads all of the files into a single dataset.

 

If they are all on the same directory and you can use simple * wildcard to find them all then just use the wildcard in the INPUT statement.  What you need to do is to is make sure to test when the actual name changes to skip the header row.

 

Something like:

data want ;
  length _filename filename $200 ;
  infile 'xxx*.txt'  dsd truncover filename=_filename  ;
  input @;
  filename = _filename ;
  if _filename ne lag(_filename) then input;

  length a b c 8 ;
  input a -- c ;
run;

Otherwise build up a list of filenames in a dataset and then use that dataset to drive the looping in the data step that reads the files.  Skip the header row and then read/write the observations until you reach the end of the current file.

 

Something like this:

data filelist ;
  infile 'ls xxx*.txt' trunover;
  input filename $200. ;
run;

data want ;
  set filelist ;
  _filename = filename ;
  infile csv filevar=_filename dsd truncover end=eof;

  length a b c 8 ;
  if not eof then input ;
  do while (not eof);
    input a -- c ;
    output;
  end;
run;

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1603 views
  • 0 likes
  • 5 in conversation