BookmarkSubscribeRSS Feed
Ravindra_
Quartz | Level 8

I am facing issue while importing CSV files to SAS.

 

I had tried to import csv files to SAS few days ago using proc import and the expected format of the variables in the output was character and we have seen them correctly, but yesterday when I had used the same production data few data sets were having numeric and other were having character variables in the output which is not obvious. We were not sure why this is happening.

 

I would be thankful if you can please help us with a solution

 

If possible can someone please suggest a macro that will automatically read the csv files from folder in one go and convert them to SAS. I have got this csv files from Trifacta where this issue is happening. Not sure if i am doing something wring or any issue with the data.

 

Thanks.

18 REPLIES 18
Kurt_Bremser
Super User

If you want consistent results from a data import from text files, DO NOT (I repeat: NOT) use proc import more than once (at maximum once to get a blueprint of the needed data step code).

 

Ideally, you do not use proc import at all, but write the data step yourself, following the documentation you got for your source file(s).

 

But if you're not yet firm with data step coding (something you need to rectify ASAP), you can use proc import once, copy the code that it created from the log, and adapt that code to your needs (as said above, use the documentation you got with the file).

 

 

Ravindra_
Quartz | Level 8

Hi Kurt, Thank you for your reply, i have got almost 28 files to import and i had used proc import in a macro and i am facing this issue. Can you please let me know if you have any macro that can read multiple csv files to SAS without any issue. Thanks, Ravindra.

Ravindra_
Quartz | Level 8
Hi Kurt,

Thank you for your reply, i have got almost 28 files to import and i had used proc import in a macro and i am facing this issue.

Can you please let me know if you have any macro that can read multiple csv files to SAS without any issue.

Thanks,
Ravindra.
Kurt_Bremser
Super User

If you have a number of files with identical structure to import, and the contents of all these files should end up in one SAS dataset, you do not need a macro; you can read all these files in one data step, either by using wildcards in the INFILE statement, or using a dynamic infile name.

So we need to know:

  • do these files have identical structure?
  • do you want the data in separate datasets, or in one dataset?
  • how are the filenames structured?
  • are these files the only files in the source directory, or do we need to exclude other files?

Supplying some example lines to illustrate the structure would be great, along with information taken from the file documentation.

To post textual data, use the </> button:

Bildschirmfoto 2020-04-07 um 08.32.59.png

ChrisNZ
Tourmaline | Level 20

1. @Kurt_Bremser's advice is to be followed: proc import does not guarantee that the files will be imported the way you want.

Only a data step allows you full control.

 

2. Are you using option guessingrows ?

 

3. Do all the files have the same structure?

 

4. This forum has many examples of answers to the issue of reading files from a folder; have you looked at them?

 

 

Ravindra_
Quartz | Level 8

Thank you for reply

 

1) Can you please help me do that using data step as a macro, as we are not allowed to write attributes within data step and there are 28 different files not sure how to handle this. So i have created macrio using proc import and did that.

2) Yes i am using guessingrows option without which there a variable which i was unable to read the whole text in a value, so i am forced to use that.

3)Every file is different having different variable length and formats

4) I have seen the post from past but didn't find that useful to me.

 

Please suggest a solution for this. Thanks for your time. 

ChrisNZ
Tourmaline | Level 20

> but yesterday when I had used the same production data few data sets were having numeric and other

proc import may not allow much flexibility in terms of output, but it will give you the same result for the same file.

Your files changed.

 

2 questions:

> please help me do that using data step .. we are not allowed to write attributes within data step

How does this reconcile?

 

You must choose one: output data sets have a known layout OR use proc import. 

Which one?

 

If the 28 files are different, you need 28 different data steps.

You can use proc import 28 times, which requires less coding, but then the data set might end up being different from what you want or expect.

If you have 28 data steps, a macro is hardly useful.

 

If the data is very similar, one data step might be usable, and you'll have to split the input statements to use depending on the file you read.

Like this person does in their question (don't look at the answers, they are irrelevant to your post).

 

 

Ravindra_
Quartz | Level 8

@ChrisNZ  and @Kurt_Bremser  thanks a lot for taking time in replying on this issue

 

please find the code i had used and do you mind providing a possible solution for this issue. Thank you

 

%MACRO CSV_to_SAS(InData = , OutData =);

libname out "&outdata";

/*proc datasets library=out kill; Quit;*/

/*Find out the files present in the Input directory*/
data yfiles;
length fref $8 filenm $80;
rc = filename(fref, "&indata.");
if rc = 0 then
do;
did = dopen(fref);
rc = filename(fref);
end;
else
do;
length msg $200.;
msg = sysmsg();
put msg=;
did = .;
end;
if did <= 0
then
putlog 'ERR' 'OR: Unable to open directory.';
dnum = dnum(did);
do i = 1 to dnum;
filenm = dread(did, i);
/* If this entry is a file, then output. */
fid = mopen(did, filenm);
if fid > 0
then
output;
end;
rc = dclose(did);
run;

/*Keep only .CSV records from file names and dataset names assigned*/
data filenme(keep = filenm filenm_wosp1 filenm_wosp);
/* Dataset name can consists of maximum 32 characters*/
length filenm_wosp1 filenm_wosp $32.;
set yfiles;
/* Keeping only .CSV records*/
if SUBSTR(UPCASE(TRIM(filenm)),LENGTH(TRIM(filenm))-3) eq ".CSV";

/* Converting Symbols(!@#$%^&()-+=[{}];',.`~ ) from the file name into _(Under score) as Dataset name cannot consist of symbols*/
if indexc(TRIM(filenm),"!@#$%^&()-+=[{}];',.`~ ") > 0 then do;
filenm_wosp1 = translate(substr(filenm,1,length(trim(filenm))-4),"_______________________","!@#$%^&()-+=[{}];',.`~ ");
end;
else do;
filenm_wosp1 = substr(filenm,1,length(trim(filenm))-4);
end;

/* Putting _(Under score) for the file names starting with numbers as Dataset name cannot start with numbers*/
if indexc(TRIM(filenm_wosp1),"1234567890") = 1 then
filenm_wosp = cat("_",substr(strip(filenm_wosp1),1,31));
else
filenm_wosp = filenm_wosp1;
run;

/* If file names are more than 32 charcters and program will convert it into 32 character */
/* dataset name then to avoid the overlapping of datasets due to same name _number(_1,_2..) */
/* would be added in the last to differentiate the name*/

Proc sort data=filenme; by filenm_wosp filenm; run;

data filenme1;
retain count;
set filenme;
by filenm_wosp filenm;
if first.filenm_wosp then count=0;
count+1;
run;

proc sql;
create table filenme2 as select *,count(*) as cnt
from filenme1 group by filenm_wosp having cnt > 0 order by filenm_wosp;
quit;

data filenme3;
set filenme2;
if cnt > 1 then substr(filenm_wosp,(length(filenm_wosp)-length(strip(put(count,best.)))))="_"||strip(put(count,best.));
else filenm_wosp=filenm_wosp;
run;


data _null_;
set filenme3;
/* Renaming file names similar to dataset names(filenm_wosp) as for some file names, it may contain & or %, which will be read as macro statement while importing data and throw an error*/
RC=rename("&indata.\"||strip(filenm), "&indata.\"||strip(filenm_wosp)||substr(strip(filenm),length(trim(filenm))-3), "file");
/* Importing files from the provided path*/
call execute ('PROC IMPORT OUT= Out.' ||strip(filenm_wosp)||
' DATAFILE= "&InData.\'||strip(filenm_wosp)||substr(strip(filenm),length(trim(filenm))-3)||'"
DBMS=CSV REPLACE; GETNAMES=YES; GUESSINGROWS =32767; DATAROW=2; run; ');
run;

/*Renaming the files to original names*/
data _null_;
set filenme3;
RC1=rename("&indata.\"||strip(filenm_wosp)||substr(strip(filenm),length(trim(filenm))-3), "&indata.\"||strip(filenm), "file");
Run;

%mend;

Kurt_Bremser
Super User

I suggest that you start with creating the necessary data steps for each separate file layout.

You can wrap all these codes into one macro by using a parameter for this:

%macro import(infile=,outds=,layout=);
%if &layout = 1 %then do;
/* code for first layout */
%end;
%else %if &layout = 2 %then %do;
......
%end;
%else %put Wrong layout code!;
%mend;

In your code that reads the filenames and selects certain files, associate certain filenames with their layouts, and call the macro accordingly.

Ravindra_
Quartz | Level 8

thanks Kurt i will try this

Tom
Super User Tom
Super User

You still haven't explained what the actual problem is.  As others have said if you run PROC IMPORT twice on the EXACT SAME input file you will get the EXACT SAME output file.

 

So if you are seeing differences it is because of one of two issues.

1) You are getting multiple copies of the same TYPE of file.  So you have one from today and one from yesterday. Or one from company A and one from company B.  Even if they have the same columns in the same order then because there is NO information in a CSV file about the data type or length of the variables PROC IMPORT must guess. So different data yields different guesses.

 

2) You are getting multiple different data files and have one or more common variables and you would like the variables to have the same definition across all of the files.

 

So which describes your situation? (or do you have both things going on?)

Kurt_Bremser
Super User

If, for a given file, you need consistent structure because datasets need to be combined later, then you MUST NOT(!) use proc import. PERIOD. It is IMPOSSIBLE to get consistent results with proc import, as each separate run makes guesses about structure from the data it sees, and different data (lengths of strings, for example) WILL result in different variable lengths. This is what you have experienced, and to remedy this, you HAVE TO use a consistent data step for each file layout. There is no way around this.

ballardw
Super User

@Ravindra_ wrote:

Thank you for reply

 

1) Can you please help me do that using data step as a macro, as we are not allowed to write attributes within data step and there are 28 different files not sure how to handle this. So i have created macrio using proc import and did that.

 


Do you have some document that tells you what the content of each of these files is supposed to look like such as length of variable, date and expected appearance, column order? If you have such a document then using that to write the code should be acceptable.If not you may need to talk to who ever issued that policy about the manpower hours this is going to cost in "fixing" data that should have been read correctly one time.

 

If no such document exists and you are going to guess every time you read files then you should bring that to the attention of management because you have no way to know what "correct" data actually would look like after it was read.

Ravindra_
Quartz | Level 8

@Tom and @ballardw  thanks for your concern, i am having the same concern about our company having such policies and not sure how to handle this. I will suggest them to use a metadata having defined structure where we can look into and generate output accordingly.

 

In this regard, i am having a small query, is it possible to automatically pick the metadata information having variable attribute information and map that with csv data and import that to SAS dataset. If it is possible how do we do that.

 

Thanks again.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 2730 views
  • 2 likes
  • 6 in conversation