Desktop productivity for business analysts and programmers

importing only few columns from multiple parts of a big csv

Reply
Occasional Contributor
Posts: 5

importing only few columns from multiple parts of a big csv

[ Edited ]

Suppose I have a very large csv which i divided into 3 parts and named them part1.csv, part2.csv and part3.csv

Each of these have the same column names (which is obvious, as being part from a larger one).

There are some 30 colmns in the CSV (say, Col1, Col2, Col3..... Col30).

 

All I want to do is to import these CSV into one dataste (say, MyData) and I want to import only few columns (say, Col1, Col13, Col20, Col 25, Col28, Col29, and Col30).

 

All columns contain only numeric values.

 

How can I do that using 'macro' or any other 'alternative' .

I just have started learning SAS so need lots of help.

 

Thanks.

PROC Star
Posts: 1,262

Re: importing only few columns from multiple parts of a big csv

Posted in reply to imanojkumar

You could do it with the various tasks, but if your csv data is well formed the following code may work just fine, and it's very short.

 

Just use it for each of your three datasets. To glue them together, there is an "Append Table..." task in EG, or you can use PROC DATASETS, which has an append function.

 

Tom

 

data want;
	infile "dir1/dir2/dsn1.csv" lrecl=32767 dlm="," missover dsd firstobs=2;
	input Var01 Var02 Var02 ... Col30; /* Type out all of the column names. I just used ... because I'm lazy */
	keep Col1 Col13 Col20 Col 25 Col28 Col29 Col30;
run;
Occasional Contributor
Posts: 5

Re: importing only few columns from multiple parts of a big csv

Thanks for replying and helping me.

 

I will really be thankful if you could explain me the code also.

For example "

infile "dir1/dir2/dsn1.csv" lrecl=32767 dlm="," missover dsd firstobs=2;

why 'missover' and what are 'lrel=32767' & 'dsd' ??

 

Can infile take more than one file (as in your case, there is only "dsn1.csv" whereas if I have three files to merge into one, then?)

assume that i have dsn1.csv, dsn2.csv and dsn3.csv and want them to merge into one (all column structures are same in three csvs).

PROC Star
Posts: 1,262

Re: importing only few columns from multiple parts of a big csv

Posted in reply to imanojkumar

By default, SAS will proceed to the next record if there isn't the expected amount of data on the current record. MISSOVER tells SAS to just set the variables to missing instead.

 

The LRECL is something I always put in, unless I have a reason to do otherwise. Sometimes SAS decides the record length is shorter than it actually is, and causes me grief.

 

FIRSTOBS=2 tells SAS to start reading at the second record...assuming that the first record is column names. You would want to remove this on the file(s) that don't have column names on the first line.

 

For the rest, it's time for you to start reading the SAS documentation, as it takes a little too long to explain in a post. Theoretically, it is possible to read from multiple O/S files in one INFILE statement, but I haven't done it in so long that I'm not comfortable jotting down the syntax. Again, hit the documentation if you really want to do this.

 

Tom

 

Occasional Contributor
Posts: 5

Re: importing only few columns from multiple parts of a big csv

Thanks you so much. I got the ideas.. from here, I will take on Smiley Happy
Super User
Posts: 13,023

Re: importing only few columns from multiple parts of a big csv

Posted in reply to imanojkumar

One question:

 

Why did you break the data into three files to begin with? SAS will read very large files and splitting them just means that you have to put them back together for your purpose.

Occasional Contributor
Posts: 5

Re: importing only few columns from multiple parts of a big csv

Hi

Thanks for noticing and helping.

 

In my case, my SAS is not allowed to read any csv which has more than 1 million rows, so if I have 2.8 Million rows, I have to trunc them with 1 million x 2 and rest into 3rd CSV file.

 

I dont know why so, may be admin reasons or something else but not sure.

 

Regards,

Super User
Posts: 13,023

Re: importing only few columns from multiple parts of a big csv

[ Edited ]
Posted in reply to imanojkumar

imanojkumar wrote:

Hi

Thanks for noticing and helping.

 

In my case, my SAS is not allowed to read any csv which has more than 1 million rows, so if I have 2.8 Million rows, I have to trunc them with 1 million x 2 and rest into 3rd CSV file.

 

I dont know why so, may be admin reasons or something else but not sure.

 

Regards,


I suggest investigating as to where this limit comes from. Since you have at least one business case where your file is larger than the limit it may be that a default was accepted somewhere without realizing that it has negative impacts on actual work flow. I would bring up that many of the methods for "splitting" files have the potential for losing data or introducing inconsistencies. For example if using Proc Import to read each of the three separate files it is quite likely that character varaibles could end up with different lengths and some variables could be numeric in one data set and character in another after import. Which will cause numerous problems when combining them. [I pretty much only use Proc Import to build skeleton code that I modify to control variable lengthns, types and formats.]

 

 

Occasional Contributor
Posts: 5

Re: importing only few columns from multiple parts of a big csv

This is a valid point. Thank you so much. I will try to explore it from pears at here.
Ask a Question
Discussion stats
  • 8 replies
  • 215 views
  • 4 likes
  • 3 in conversation