Re: Confirmation about below behaviour of Infile

shahsn11 · Posted 08-04-2018 01:50 PM

I used infile to read a CSV and store its information in a particular table , lets say Table1.

Now i used another CSV file and try to store its information in the same table. Every time the old content in table Table1 is deleted and new one is inserted.

Is there a way i can append the content of 2 different files into same table Tablel1 ?

I know that i can create one more table and then used it to append it to Table1. But do we have a option in Infile like we have in Proc Import (replace is the option i am talking about).

Thanks.

Kurt_Bremser · Posted 08-04-2018 02:24 PM

Read into a temporary table and then append that to the first one.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

art297 · Posted 08-04-2018 02:47 PM

It can be done, but will probably take longer than using the approach that @Kurt_Bremser suggested:

* Create examples of the two files */

data _null_;
  infile datalines truncover;
  file "/folders/myfolders/base_file.csv" dsd;
  input id fruit :$10.;
  put (_all_) (+1);
  datalines;
1 apple
2 banana
3 coconut
;
run;

data _null_;
  infile datalines truncover;
  file "/folders/myfolders/append_it.csv" dsd;
  input id fruit :$10.;
  put (_all_) (+1);
  datalines;
4 date
5 elderberry
6 fig
;
run;

/* create first dataset */
data have;
  infile '/folders/myfolders/base_file.csv' dsd;
  input id fruit :$10.;
run;

/* append second dataset */
data have;
  set have end=last;
  infile '/folders/myfolders/append_it.csv' dsd end=lastnew;
  output;
  if last then do until (lastnew);
    input id fruit :$10.;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

FreelanceReinh · Posted 08-04-2018 03:20 PM

@shahsn11 wrote:

I know that i can create one more table and then used it to append it to Table1. But do we have a option in Infile like we have in Proc Import (replace is the option i am talking about).

As you probably know, omitting the REPLACE option of PROC IMPORT is just a protection against overwriting an existing dataset. It does not provide for automated appending. If you need this type of protection, have a look at the REPLACE= dataset option.

It is possible to read two or more CSV files in the same data step (see examples in the documentation of the INFILE statement). But for just two files I don't think it's worth the coding effort (as PROC APPEND is so easy to use). If the second CSV file was not yet available when the first one is processed, this wouldn't work anyway. Also, it's safer to read the (possibly small) second file separately and append the resulting dataset to the (possibly large and valuable) cumulative dataset only after checking it carefully.

Tom · Posted 08-04-2018 05:54 PM

Please show what code you are using.

Here is an example of doing what you asked. First let's make a new datasets and at the same time create a CSV file we want to read in and append to it.

filename csv temp;
data file1 ;
 set sashelp.class ;
 file csv dsd ;
 put (_all_) (+0);
run;

So now FILE1 has a copy of SASHELP.CLASS and the filename CSV points to another copy in a comma delimited text file.

Now that I have created a test situation that is similar to your request I can show you how to use a data step that uses the MODIFY command and INFILE to append new observations to FILE1.

data file1 ;
  modify file1 ;
  infile csv dsd truncover ;
  input (_all_) (+0);
  output;
run;

art297 · Posted 08-04-2018 06:19 PM

@shahsn11: While I definitely like the look of @Tom's suggested code over that which I suggested, I should point out that both sets of code do the same thing, but my code runs 76 times faster than @Tom's code.

@Tom: I was actually trying to see how much faster your code would be and was surprised to see that it ran so much slower. Any idea why?

The test I ran was:

/* Create examples of the two files */

data _null_;
  infile datalines truncover;
  file "/folders/myfolders/base_file.csv" dsd;
  input id fruit :$10.;
  do i=1 to 100000;
    put (_all_) (+1);
  end;
  datalines;
1 apple
2 banana
3 coconut
;
run;

data _null_;
  infile datalines truncover;
  file "/folders/myfolders/append_it.csv" dsd;
  input id fruit :$10.;
  do i=1 to 100000;
    put (_all_) (+1);
  end;
  datalines;
4 date
5 elderberry
6 fig
;
run;

/* create first dataset */
data have;
  infile '/folders/myfolders/base_file.csv' dsd;
  input id fruit :$10.;
run;

data want1;
  set have;
run;

/* append second dataset */
data want1 ;
  modify want1 ;
  infile "/folders/myfolders/append_it.csv" dsd truncover ;
  input (_all_) (+0);
  output;
run;

data want2;
  set have end=last;
  infile '/folders/myfolders/append_it.csv' dsd end=lastnew;
  output;
  if last then do until (lastnew);
    input id fruit :$10.;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

Tom · Posted 08-04-2018 06:57 PM

Executing the MODIFY statement on every data step iteration causes extra processing time.

But it turns out you don't have to actually execute it.

/* MODIFY */
data want1 ;
  if 0 then modify want1 ;
  infile extra dsd truncover ;
  input (_all_) (+0);
  output;
run;

art297 · Posted 08-04-2018 07:27 PM

@Tom Still takes twice as long, but a definite improvement!

Thanks,

Art, CEO, AnalystFinder.com

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!