BookmarkSubscribeRSS Feed
imanojkumar
Fluorite | Level 6

Suppose I have a very large csv which i divided into 3 parts and named them part1.csv, part2.csv and part3.csv

Each of these have the same column names (which is obvious, as being part from a larger one).

There are some 30 colmns in the CSV (say, Col1, Col2, Col3..... Col30).

 

All I want to do is to import these CSV into one dataste (say, MyData) and I want to import only few columns (say, Col1, Col13, Col20, Col 25, Col28, Col29, and Col30).

 

All columns contain only numeric values.

 

How can I do that using 'macro' or any other 'alternative' .

I just have started learning SAS so need lots of help.

 

Thanks.

8 REPLIES 8
TomKari
Onyx | Level 15

You could do it with the various tasks, but if your csv data is well formed the following code may work just fine, and it's very short.

 

Just use it for each of your three datasets. To glue them together, there is an "Append Table..." task in EG, or you can use PROC DATASETS, which has an append function.

 

Tom

 

data want;
	infile "dir1/dir2/dsn1.csv" lrecl=32767 dlm="," missover dsd firstobs=2;
	input Var01 Var02 Var02 ... Col30; /* Type out all of the column names. I just used ... because I'm lazy */
	keep Col1 Col13 Col20 Col 25 Col28 Col29 Col30;
run;
imanojkumar
Fluorite | Level 6

Thanks for replying and helping me.

 

I will really be thankful if you could explain me the code also.

For example "

infile "dir1/dir2/dsn1.csv" lrecl=32767 dlm="," missover dsd firstobs=2;

why 'missover' and what are 'lrel=32767' & 'dsd' ??

 

Can infile take more than one file (as in your case, there is only "dsn1.csv" whereas if I have three files to merge into one, then?)

assume that i have dsn1.csv, dsn2.csv and dsn3.csv and want them to merge into one (all column structures are same in three csvs).

TomKari
Onyx | Level 15

By default, SAS will proceed to the next record if there isn't the expected amount of data on the current record. MISSOVER tells SAS to just set the variables to missing instead.

 

The LRECL is something I always put in, unless I have a reason to do otherwise. Sometimes SAS decides the record length is shorter than it actually is, and causes me grief.

 

FIRSTOBS=2 tells SAS to start reading at the second record...assuming that the first record is column names. You would want to remove this on the file(s) that don't have column names on the first line.

 

For the rest, it's time for you to start reading the SAS documentation, as it takes a little too long to explain in a post. Theoretically, it is possible to read from multiple O/S files in one INFILE statement, but I haven't done it in so long that I'm not comfortable jotting down the syntax. Again, hit the documentation if you really want to do this.

 

Tom

 

imanojkumar
Fluorite | Level 6
Thanks you so much. I got the ideas.. from here, I will take on 🙂
ballardw
Super User

One question:

 

Why did you break the data into three files to begin with? SAS will read very large files and splitting them just means that you have to put them back together for your purpose.

imanojkumar
Fluorite | Level 6

Hi

Thanks for noticing and helping.

 

In my case, my SAS is not allowed to read any csv which has more than 1 million rows, so if I have 2.8 Million rows, I have to trunc them with 1 million x 2 and rest into 3rd CSV file.

 

I dont know why so, may be admin reasons or something else but not sure.

 

Regards,

ballardw
Super User

@imanojkumar wrote:

Hi

Thanks for noticing and helping.

 

In my case, my SAS is not allowed to read any csv which has more than 1 million rows, so if I have 2.8 Million rows, I have to trunc them with 1 million x 2 and rest into 3rd CSV file.

 

I dont know why so, may be admin reasons or something else but not sure.

 

Regards,


I suggest investigating as to where this limit comes from. Since you have at least one business case where your file is larger than the limit it may be that a default was accepted somewhere without realizing that it has negative impacts on actual work flow. I would bring up that many of the methods for "splitting" files have the potential for losing data or introducing inconsistencies. For example if using Proc Import to read each of the three separate files it is quite likely that character varaibles could end up with different lengths and some variables could be numeric in one data set and character in another after import. Which will cause numerous problems when combining them. [I pretty much only use Proc Import to build skeleton code that I modify to control variable lengthns, types and formats.]

 

 

imanojkumar
Fluorite | Level 6
This is a valid point. Thank you so much. I will try to explore it from pears at here.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 10186 views
  • 4 likes
  • 3 in conversation