BookmarkSubscribeRSS Feed
shahsn11
Fluorite | Level 6

I used infile to read a CSV and store its information in a particular table , lets say Table1.

 

Now i used another CSV file and try to store its information in the same table. Every time the old content in table Table1 is deleted and new one is inserted.

 

Is there a way i can append the content of 2 different files into same table Tablel1 ?

 

I know that i can create one more table and then used it to append it to Table1. But do we have a option in Infile like we have in Proc Import (replace is the option i am talking about).

 

Thanks.

7 REPLIES 7
art297
Opal | Level 21

It can be done, but will probably take longer than using the approach that @Kurt_Bremser suggested:

* Create examples of the two files */

data _null_;
  infile datalines truncover;
  file "/folders/myfolders/base_file.csv" dsd;
  input id fruit :$10.;
  put (_all_) (+1);
  datalines;
1 apple
2 banana
3 coconut
;
run;

data _null_;
  infile datalines truncover;
  file "/folders/myfolders/append_it.csv" dsd;
  input id fruit :$10.;
  put (_all_) (+1);
  datalines;
4 date
5 elderberry
6 fig
;
run;

/* create first dataset */
data have;
  infile '/folders/myfolders/base_file.csv' dsd;
  input id fruit :$10.;
run;

/* append second dataset */
data have;
  set have end=last;
  infile '/folders/myfolders/append_it.csv' dsd end=lastnew;
  output;
  if last then do until (lastnew);
    input id fruit :$10.;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

FreelanceReinh
Jade | Level 19

@shahsn11 wrote:

 

I know that i can create one more table and then used it to append it to Table1. But do we have a option in Infile like we have in Proc Import (replace is the option i am talking about).


As you probably know, omitting the REPLACE option of PROC IMPORT is just a protection against overwriting an existing dataset. It does not provide for automated appending. If you need this type of protection, have a look at the REPLACE= dataset option.

 

It is possible to read two or more CSV files in the same data step (see examples in the documentation of the INFILE statement). But for just two files I don't think it's worth the coding effort (as PROC APPEND is so easy to use). If the second CSV file was not yet available when the first one is processed, this wouldn't work anyway. Also, it's safer to read the (possibly small) second file separately and append the resulting dataset to the (possibly large and valuable) cumulative dataset only after checking it carefully.

Tom
Super User Tom
Super User

Please show what code you are using.

Here is an example of doing what you asked.  First let's make a new datasets and at the same time create a CSV file we want to read in and append to it.

filename csv temp;
data file1 ;
 set sashelp.class ;
 file csv dsd ;
 put (_all_) (+0);
run;

So now FILE1 has a copy of SASHELP.CLASS and the filename CSV points to another copy in a comma delimited text file.

Now that I have created a test situation that is similar to your request I can show you how to use a data step that uses the MODIFY command and INFILE to append new observations to FILE1.

data file1 ;
  modify file1 ;
  infile csv dsd truncover ;
  input (_all_) (+0);
  output;
run;

 

art297
Opal | Level 21

@shahsn11: While I definitely like the look of @Tom's suggested code over that which I suggested, I should point out that both sets of code do the same thing, but my code runs 76 times faster than @Tom's code.

 

@Tom: I was actually trying to see how much faster your code would be and was surprised to see that it ran so much slower. Any idea why?

 

The test I ran was:

/* Create examples of the two files */

data _null_;
  infile datalines truncover;
  file "/folders/myfolders/base_file.csv" dsd;
  input id fruit :$10.;
  do i=1 to 100000;
    put (_all_) (+1);
  end;
  datalines;
1 apple
2 banana
3 coconut
;
run;

data _null_;
  infile datalines truncover;
  file "/folders/myfolders/append_it.csv" dsd;
  input id fruit :$10.;
  do i=1 to 100000;
    put (_all_) (+1);
  end;
  datalines;
4 date
5 elderberry
6 fig
;
run;

/* create first dataset */
data have;
  infile '/folders/myfolders/base_file.csv' dsd;
  input id fruit :$10.;
run;

data want1;
  set have;
run;

/* append second dataset */
data want1 ;
  modify want1 ;
  infile "/folders/myfolders/append_it.csv" dsd truncover ;
  input (_all_) (+0);
  output;
run;

data want2;
  set have end=last;
  infile '/folders/myfolders/append_it.csv' dsd end=lastnew;
  output;
  if last then do until (lastnew);
    input id fruit :$10.;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Tom
Super User Tom
Super User

Executing the MODIFY statement on every data step iteration causes extra processing time. 

But it turns out you don't have to actually execute it.

/* MODIFY */
data want1 ;
  if 0 then modify want1 ;
  infile extra dsd truncover ;
  input (_all_) (+0);
  output;
run;
art297
Opal | Level 21

@Tom Still takes twice as long, but a definite improvement!

 

Thanks,

Art, CEO, AnalystFinder.com

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 832 views
  • 2 likes
  • 5 in conversation