BookmarkSubscribeRSS Feed
DivyaGadde
Fluorite | Level 6

Hi! I am trying to import a huge dataset with 55M records and 150 variables into SAS. I used proc import statement to import from notepad to SAS. I had no issues with the process, haven't got any errors but I noticed that data is not getting loaded into the cells. when I tried to see the cause, I ve noticed that there is address variable in the data file with the format : Address1|Address2|City|ZIP that is causing the truncation of the data.

I have used double pipe as the delimiter to import the file into SAS but unfortunately SAS interprets the single pipe(“|”) within the data fields, especially in the address column, as a delimiter, which results in truncation of some information during the import process.

Could someone help me how to deal with this issue? Thanks in Advance!

3 REPLIES 3
LinusH
Tourmaline | Level 20

SAS can't handle delimeters OOTB with more than one character.

If you can recreate the file using quotation around your char variables, and a single char as a delimter, it would hopefully work.

Your other option is to write your own program to deal with this record layout, using a data step.

Data never sleeps
ballardw
Super User

Just for clarification: Your file is using a two-character delimiter, ||, between variables and you have one variable that has | characters in the middle?

 

A data step can use the DLMSTR option to use the || between variables. You would likely want to parse the address information with SCAN after reading it into a single variable.

 

Data truncation with Proc Import can be the result of not setting a large enough value for the GUESSINGROWS option. By default the procedure only examines a very small number of rows to guess how long variables may be. A larger setting of Guessingrows can tell the procedure to examine up to 32K rows before setting the length of the variables.

 

Do you have a document that describes the layout of the file? That may be a great help to write a data step to read the file.

Tom
Super User Tom
Super User

Clarify what you mean by "import".  Did you attempt to use PROC IMPORT to read the file?

Or did you instead just write a dats step to read the file?

 

You will have much more success writing your own data step.  You can set the variable names, types and storage lengths directly instead of forcing SAS to make guesses about what you want them to be.

 

And in your case you could DLMSTR= option on the INFILE statement instead of the DLM= option.

 

The DLM= option accepts a string of character where every character is a possible delimiter.  So specifying DLM='||' is the same thing as specifying DLM='|'.

 

The DLMSTR= option accepts a string that must match exactly.

 

But if the address field you mentioned always has 3 internal | characters on every observation then you could read the file using just | as the delimiter, only read the "address" field as four separate fields instead.  But if some of the observations have less than 3 | characters in that field then you cannot do that.

 

Writing a data step to read a delimited file is simple in SAS.  Just define the variables. Attach any needed informats or formats or labels.  Then using and INFILE statement and an INPUT statement.  If you define the variables in the order they appear in the text file then the INPUT statement can just use a simple positional variable list.  If you don't know what names to use for the variables just copy the header line from the file and convert the header text into variable names.  You can do a much better job of creating usable names from the header text than PROC IMPORT's guesses will be.

 

So your program will look something like this:

data want;
  infile 'myfile.txt' dsd dlmstr='||' firstobs=2 truncover ;
  length id $20 name $50 date 8 age 8 address_list $500 other 8;
  informat date date.;
  format date date9.;
  input id -- other;
  length address1 address2 $200 city $40 zip $10 ;
  address1=scan(address_list,1,'|','m');
  address2=scan(address_list,2,'|','m');
  city=scan(address_list,3,'|','m');
  zip=scan(address_list,4,'|','m');
run;

 

Any value that has embedded delimiter stings (|| in this example) will need to be quoted.  So if the address list only had CITY and ZIP so that it was coded as 

||My City|12345

In the text file it should look like:

previous field||"||My City|12345"||next field

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 772 views
  • 3 likes
  • 4 in conversation