DATA Step, Macro, Functions and more

An example of importing a csv file into SAS by Data step code

Reply
Contributor
Posts: 52

An example of importing a csv file into SAS by Data step code

I am SAS beginner and trying to import a large .csv file into SAS. Could any one give me an example of importing a csv file into SAS by Data step code please? Beside, do I need any code in data step to ensure the accuracy of imported date?

Super User
Posts: 24,003

Re: An example of importing a csv file into SAS by Data step code

We can't give you code because data step code requires knowledge of the data. You typically need to know the variables, variable type and length for starters, usually noted in the record layout or documentation on the data set. 

 

You can always use PROC IMPORT and then steal the data step code from the log. 

 

A bit of this is written up here:

 

https://stats.idre.ucla.edu/sas/faq/how-do-i-read-in-a-delimited-ascii-file-in-sas/

 


@France wrote:

I am SAS beginner and trying to import a large .csv file into SAS. Could any one give me an example of importing a csv file into SAS by Data step code please? Beside, do I need any code in data step to ensure the accuracy of imported date?


 

Contributor
Posts: 52

Re: An example of importing a csv file into SAS by Data step code

Thanks for your answer. but how could I know the maximum length of each column if the csv file is so large that I cannot check them one by one? Could you give me some codes to check it please?

 

For example, if the Domain of two variables ( separately APPLN_ID and APPLN_TITLE) is 'Number 0 ... 999 999 999' and 'Up to 3000 characters', is that mean the maximum length of two variables are separately 9 and 3000?  Could i import them into sas by following codes?

 

data XXX;

input APPLN_ID: 9. APPLN_TITLE: $3000. ;

format APPLN_ID: 9. APPLN_TITLE: $3000. ;

run;

 

or 

 

data XXX;

informat APPLN_ID 9.;

informat APPLN_TITLE $3000. ;

format APPLN_ID 9.;

format APPLN_TITLE $3000. ;

input 

APPLN_ID

APPLN_TITLE $;

run;

 

or

 

data XXX;

% let _EFIERR_ =0;

inflie 'XXX.csv' delimiter = ',' missover dsd lrecl=32767;

informat APPLN_ID 9.;

informat APPLN_TITLE $3000. ;

format APPLN_ID 9. ;

format APPLN_TITLE $3000. ;

input 

APPLN_ID

APPLN_TITLE $

;

if _ERROR_ then call symputx ('_EFIERR_',1);

run;

 

Which one is right?  what the difference between them? 

thanks in advance.

 

Super User
Posts: 10,570

Re: An example of importing a csv file into SAS by Data step code

[ Edited ]

With guessingrows=max, proc import will check the whole file.

 

But I concur with @Reeza, it's best to write the data step according to the documentation that came with the file.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 52

Re: An example of importing a csv file into SAS by Data step code

Posted in reply to KurtBremser
Thanks for your answer.
Super User
Posts: 24,003

Re: An example of importing a csv file into SAS by Data step code

If this is the question you also posted on SO, my answer there is still correct. 

 


@France wrote:

Thanks for your answer. but how could I know the maximum length of each column if the csv file is so large that I cannot check them one by one? Could you give me some codes to check it please?

 


You typically need to know the variables, variable type and length for starters, usually noted in the record layout or documentation on the data set. 

 

First rule of data analysis: Know thy data.

 

 'Number 0 ... 999 999 999' and 'Up to 3000 characters'

 

I would interpret that as the variable has values ranging from 0 to 999,999,999 in a character variable that has a length of 3000 characters. Why it would be 3000 characters long, I have no idea...seems too big to me by far.

 

I don't have the file or the record layout so there is no way I can say which is right. I can say those are different methods. 

I would suggest taking a  small subset of data, testing it on that and then using it on the full data set once you have it working. 

There are many powershell or other examples on how to retrieve a small subset of a text file. 

 

 

Contributor
Posts: 52

Re: An example of importing a csv file into SAS by Data step code

Thanks a lot, Reeza. the data is PATSTAT database. and I will try by using a small subset of data.
Super User
Posts: 24,003

Re: An example of importing a csv file into SAS by Data step code


@France wrote:
Thanks a lot, Reeza. the data is PATSTAT database. and I will try by using a small subset of data.

Then it has established documentation:

 

PATSTAT – EPO Worldwide Patent Statistical Database data catalog
Description of data model of PATSTAT Raw Data and PATSTAT Online. The data catalog contains detailed descriptions of the tables and all the attributes (fields). Be aware that the data catalog is version dependent. 

 

 

I would echo @KurtBremser solution though, just use PROC IMPORT and set GUESSINGROWS = max. Go home. Check the code the next day and use that as your starting point. 

 

 

 

 

Contributor
Posts: 52

Re: An example of importing a csv file into SAS by Data step code

ok, I have download the data catalog.
Have a good night.
Super User
Posts: 13,941

Re: An example of importing a csv file into SAS by Data step code


@France wrote:

Thanks for your answer. but how could I know the maximum length of each column if the csv file is so large that I cannot check them one by one? Could you give me some codes to check it please?

 


 

 

In many organizations data files, especially large ones often have documentation that has details such as the column heading, the expected data type (character, numeric, date, time or something) and the maximum lengths expected of character variables. Sometimes there will even be a list of acceptable values such a "1 to 5, 7 or 9" or "Apple, Pear, Banana or blank".

Typically the more important the data the more likely such a document exists.

 

Your first two data steps would not work because you do not have an INFILE statement indicating the file name and options needed to describe the file for reading such as a delimiter or the maximum length of the input line to read (needed if the line exceeds the default line length.

 

The last one is more likely to work (and pretty obviously based on Proc Import code) because it does indicate the source file, indicates it is a delimited file and that comma will separate the values (delimiter=','), a maximum expected line length (Lrecl), if the character variables may have commas imbedded (DSD) but if the value enclosed in quotes the comma won't end the variable, and what to do if the data runs out before all of the variables on the input statement are read (missover which  would have any of the variables missing not found in the data, write the record to the output and then start reading the next data record from the first variable).

The lines

% let _EFIERR_ =0;

if _ERROR_ then call symputx ('_EFIERR_',1);

create a macro variable _EFIERR_ that is set to 0 by default and if any errors occur when reading the data then it is set to 1. Other steps could use this information to do other tasks if there is a problem. These lines are included in Proc Import generated code because the SAS developers do not know if the code used will be in a batch environment (no one looking over the results as steps are running) and so you could include additional instructions such as to stop any processing and write a note to the log or such to notify the user there was a problem with the data.

Contributor
Posts: 52

Re: An example of importing a csv file into SAS by Data step code

Many thanks for your answer, ballardw.
Ask a Question
Discussion stats
  • 10 replies
  • 125 views
  • 2 likes
  • 4 in conversation