BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ama220
Obsidian | Level 7

I imported a cvs comma delimited file (that I saved from an excel file). This file contains some numeric variables as well as character variables. The character variables contain open ended narrative descriptions that can often be long.

 

When I run a proc freq or a proc print, I get ineligible values for some variables, such as " , M, 1, 2. In one instance, the frequency contains a value from another variable. The problem seems to be limited to a few lines of data only but it is disruptive of my analysis. 

 

Here is an SAS Output that is not legible 

 

ID

Frequency

Percent

"

2

1

 
 
Thanks 
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

A common issue with exported from Excel files that have manual entries are people using Alt-Enter to create multiple "lines" in the Excel file. By default when these are exported (saved-as) CSV the "alt-enter" creates a line in the data so that a record occupies more than one line in the CSV. Proc Import does not really expect such. So some variables actually end up seeing values from more than one "column" in Excel.

View solution in original post

10 REPLIES 10
r_behata
Barite | Level 11

Could you post few lines of sample data and the code you have tried .

 

And what is your criteria for ineligible values ?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You saved the data to a plain text format called Comma Separated Variable (CSV) correct?  I would avoid using proc import - it is a guessing procedure so if it guesses wrong your data will be messed up.  In the log, after you run the proc import, you will see a big datastep with all your variables.  Copy that code out into your editor, then change it to accurately reflect your data.  So you will have something which looks like:

data want;
  infile ".../yourcsv.csv" dlm=",";
  input ...;
  format ...;
  informat ...;
  length ...;
run;

That should get your data in as you expect as you are the one deciding how it is imported.

If however your data are longer than 2000 characters then you have more of a problem.

ama220
Obsidian | Level 7

Thanks for your response. I am using the import statement as follows

 

PROC IMPORT OUT= narr.filename 
DATAFILE= "c:\filename _nar.csv"
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
GUESSINGROWS=MAX;
RUN;

ballardw
Super User

A common issue with exported from Excel files that have manual entries are people using Alt-Enter to create multiple "lines" in the Excel file. By default when these are exported (saved-as) CSV the "alt-enter" creates a line in the data so that a record occupies more than one line in the CSV. Proc Import does not really expect such. So some variables actually end up seeing values from more than one "column" in Excel.

ama220
Obsidian | Level 7

Thanks! I believe that is the issue. Any idea how to spot the cells where the "alt-enter" occurs in the excel file?

Tom
Super User Tom
Super User

You should be able to read the lines with the embedded linefeeds by using the TERMSTR=CRLF option on the FILENAME or INFILE statement.

Just make sure you haven't moved the file via FTP (or other protocol) that will change the end of line characters in the file from CRLF to just LF because then SAS will not know how to tell the difference between real end of line markers and those that appear in the middle of a field's value.

ama220
Obsidian | Level 7

Thank you for putting me on the right track. I ended up having to go through the excel file and deleting the extra spaces at the end of several cells to resolve the problem. Although the code was helpful, it did not solve the problem completely. 

ama220
Obsidian | Level 7

Thank you for putting me on the right track. I ended up having to go through the excel file and deleting the extra spaces at the end of several cells that I had manually coded to resolve the problem. Although the code was helpful, it did not solve the problem completely. 

ballardw
Super User

If the TERMSTR option mentioned by @Tom doesn't work and the file isn't very large you can try opening the file in a TEXT editor such as Notepad (turn off word wrap) or even WordPad. The problem lines will generally appear different at the left hand margin. The fix is likely to be a backspace from the start of the problem area. Possibly multiple backspace to get each record reduced to single lines. This is more fun when there are multiple columns in the Excel file that do the Alt-Enter.

 

If you don't know how to write a data step to read the file start with the proc import that you have. When you run that the log will contain the data step code used to read your file. So you can add the TERMSTR to the infile there. Also, check the informat statements which will set variable types to see if the columns match you expected types. If the value should be numeric then the informat is likely going to be BEST32 as assigned by Proc Import or possibly COMMA something if you have currency values. If the value is numeric and you see a $ in the informat then the value is being read as character and you need to adjust accordingly. You may also take the time to do search and replace to get nicer variable names and assign meaningful labels to the program reading the data.

Most of the FORMAT statements generated are not going to be useful if they are character and can be safely removed. The FORMATS for numeric will likely be BEST something and could also be removed unless you have an actual preference.

CAUTION: Import will often assign numeric properties to identifiers such as account or other identification numbers which can cause issues with loss of significant leading zeroes or overall precision of storage for very long values (16 or more digits). Consider setting these as CHARACTER with an appropriate $ informat.

 

I have worked off and on with data for more than 30 years and have never had someone ask for the "average account number" or standard deviation of identification numbers. So if the variable is not used for arithmetic it likely should be character.

 

ama220
Obsidian | Level 7

Thank you for your helpful insight and for taking the time to explain all this detail! It was helpful for me. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1220 views
  • 5 likes
  • 5 in conversation