BookmarkSubscribeRSS Feed
Emma2021
Quartz | Level 8
I have a Csv file (number of rows and columns can change)—text data can contain Latin character, open quote, delimiter (,), and line spaced.
Is there a way to read without informat/format and reading all variables by names?
When I used proc import with encoding=‘Any’ the variables were shifted (Some values from some columns headings were shifted to different columns). Thank you.
10 REPLIES 10
Tom
Super User Tom
Super User

Show an example of a line where the values "shift".

If you data has the delimiter character in it (which is normally a comma for a Comma Separated Values file) then those values need to be enclosed in quotes.  Otherwise there is no way to parse the line.

 

Consider the case of a field with variable A,B and C.  A=1, B='2,3,4' and C=5 and you generate a comma delimited files without quoting the value of B you get a line like:

1,2,3,4,5

Which has 5 values, not 3 values.

Once you add the quotes then it has three values:

1,"2,3,4",5

If you have a file that was generated improperly you MIGHT be able to interpret it, but only if the delimiters can only appear in one of the fields.  So in the example you can take A from the left and C from the right and whatever is left over is the value of B.

Emma2021
Quartz | Level 8
I deprecated by |, but still could not import properly. I think it is because there are random line spaces. How can I remove/fix those in Sas? Thank you.
ballardw
Super User

@Emma2021 wrote:
I deprecated by |, but still could not import properly. I think it is because there are random line spaces. How can I remove/fix those in Sas? Thank you.

Post example data of what you mean by "random line spaces".

Open a text box on the forum with the </> icon, copy a few lines that show the behavior from your file opened in a text editor like Notepad or similar, and paste into the window opened on the forum.

 

If you mean that different lines have different order of variables, or different variables, is there a pattern to tell which are which?

 

I have written code to read as many as 25 different file "layouts" inside a single file but details matter in such things.

 

Do you have any documentation from the source about the file structure, Variable types or lengths, date value appearance?

If you don't have any of this you add a lot of headaches to "guess until it doesn't fail" type coding.

Emma2021
Quartz | Level 8
I do not want to open each csv file and fix the line (inserted a new line within one text variable)—since there are many csv files. Is there a way to fix those random new lines were inserted within one text variable?
Tom
Super User Tom
Super User

@Emma2021 wrote:
I do not want to open each csv file and fix the line (inserted a new line within one text variable)—since there are many csv files. Is there a way to fix those random new lines were inserted within one text variable?

Exactly.  That is why we asked you to open one of the files with issues and copy and paste a few of the lines where the issue appears and paste it into the Insert Code box that pops up.

 

Once we can see examples of the issue we can give you advice on whether or not a programmatic solution to clean up the files is possible.

ballardw
Super User

@Emma2021 wrote:
I do not want to open each csv file and fix the line (inserted a new line within one text variable)—since there are many csv files. Is there a way to fix those random new lines were inserted within one text variable?

 

If it is actually random, then almost certainly there is no "good" programming method to fix it? But are you sure they are actually "random" or based on content.  What is the response to the question about file layout documentation?

 

And did you open any of these in a spreadsheet program and then save it?  That can change the actual content.

Tom
Super User Tom
Super User

I assume you meant delimited.

Changing the delimiter to one that is less common reduces the chances of files that cannot be parsed, but does not eliminate the issue.  Unless the process of creating the files makes sure to remove any delimiters that appear in the data before generating the file.

 

What is the source of these files?  Are you making them yourself?  Are they all historical? Or do you plan to get more in the future?

If you are making them yourself then how are you making them?  Are you generating them from some database system? What system? What command are you using?

If you are getting them from someone else what agreements did you put in place in advance to make sure they delivered files that can be read?

Emma2021
Quartz | Level 8
Sorry, I don’t have my pc now.
A new line means as below:
After
the
dinner, all @
Etc. I made up an pseudo data. Thank you
Tom
Super User Tom
Super User

Details matter. 

If the values with the inserted end-of-line characters are quoted then it is easy to fix.

https://communities.sas.com/t5/New-SAS-User/Reading-CSV-file-with-double-quotes-line-breaks-and-spac...

 

Emma2021
Quartz | Level 8
Not always the end of variable has a new line space. Sometimes in the middle of variables have random new line spaces (I think even within 252 character length).

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1709 views
  • 0 likes
  • 3 in conversation