BookmarkSubscribeRSS Feed
GreggB
Pyrite | Level 9

Double quotes in the csv file are causing problems. I have used the only 2 methods I know for importing the file. Editing the csv file is not an option. Much of the data in the output has been starred out due to PII issues. The culprit is in red font. 

 

proc import datafile="FAKE\2021\sql_dev_export.csv" out=crs dbms=dlm  replace;
delimiter=',';
guessingrows=max;
run;

ALTERNATIVELY

data WORK.CRS    ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'G:\FAKEPATH\2021\sql_dev_export.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
format ID best12. ;
format LAST_NAME $40. ;
format FIRST_NAME $40. ;
format MIDDLE_NAME $30. ;
format DOB $10. ;
format GENDER $2. ;
format GRADE_LEVEL best12. ;
format STATE_STUDENTNUMBER best12. ;
format STUDENT_NUMBER best12. ;
format ALERT_OTHER $100. ;
format GUARDIANEMAIL $55. ;
format HOME_PHONE $14. ;
format STREET $36. ;
format CITY $21. ;
format ZIP $10. ;
format COURSE_NUMBER $42. ;
format SECTION_NUMBER $42. ;
format TERMID best12. ;
format CC_SCHOOLID best12. ;
format PREVTCHNUM $5. ;
format TEACHERNUMBER $8. ;
format TCHFIRST $40. ;
format TCHMID $11. ;
format TCHLAST $40. ;
format STATUS best12. ;
format EMAIL $26. ;
format COURSE_NAME $42. ;
format ENGL_PROF $8. ;
input
ID  
LAST_NAME  $
FIRST_NAME  $
MIDDLE_NAME  $
DOB  $
GENDER  $
GRADE_LEVEL  
STATE_STUDENTNUMBER  
STUDENT_NUMBER  
ALERT_OTHER  $
GUARDIANEMAIL  $
HOME_PHONE  $
STREET  $
CITY  $
ZIP  $
COURSE_NUMBER  $
SECTION_NUMBER  $
TERMID  
CC_SCHOOLID  
PREVTCHNUM  $
TEACHERNUMBER  
TCHFIRST  $
TCHMID  $
TCHLAST  $
STATUS
EMAIL  $
COURSE_NAME  $
ENGL_PROF  $
;
if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
run;

OUTPUT

Obs ID LAST_NAME FIRST_NAME MIDDLE_NAME DOB GENDER GRADE_LEVEL STATE_STUDENTNUMBER STUDENT_NUMBER ALERT_OTHER GUARDIANEMAIL HOME_PHONE STREET CITY ZIP COURSE_NUMBER SECTION_NUMBER TERMID CC_SCHOOLID PREVTCHNUM TEACHERNUMBER TCHFIRST TCHMID TCHLAST STATUS EMAIL COURSE_NAME ENGL_PROF 
23260 40627 ******************** 09/29/09 F 5 8977848156 15901 SDOC@Home **************@yahoo.com 864-***-**** 165 ******************** 29691 08960500 1 3000 28 9699 7643 ******** * ******** 1 ******@sdoc.org HR 5 9 
23261 40629 ******************* 07/14/10 F 4 6104058205 15903 "IEP               . .           .       
23262 .   864-***-**** 116 Wo*** Way Wal***** 29 8960400 5 3000 22 1403 2323 ****** * S*** 1 *****@@sdoc.org . 9           .       

LOG

NOTE: Invalid data for ID in line 1584 1-10.
NOTE: Invalid data for GRADE_LEVEL in line 1584 87-94.
NOTE: Invalid data for TERMID in line 1584 169-181.

 

10 REPLIES 10
ballardw
Super User

You don't provide any example of a specific value that is having issues. It does not have to be actual data but provide something and what is not working for a similar value.

 

I also submit that some of your variables that are numeric likely should not. Do you do arithmetic with Id, Termid, grade_level State_StudentNumber, CC_Schoolid ? If not then likely they should be character, especially with some of those throwing invalid data messages when read as numeric.

GreggB
Pyrite | Level 9

the culprits are the alert_other values with double quotes. For example, IEP" 

 

I tried the import using all character variables with the same result.

GreggB
Pyrite | Level 9
Observation 23262 is the continuation of 23261 . "IEP causes a return to a new line
Shmuel
Garnet | Level 18

What encoding has the imported file? Maybe the characters after the "IEP  are not of the default encoding?

Can you post line 1584 using the </> icon or attached as a .txt file? 

GreggB
Pyrite | Level 9

1584 SDOC@Home","**********143@gmail.com","810-***-****","215 Jas*** Ln","Seneca","29672","2
89 110M700","15",3000,26,22414,"9261","******","R","Ki****",1,"i******@sdoc.org","Mathema
177 tics 7","9" 187
ID=. LAST_NAME=****143@gmail.com FIRST_NAME=810-***- **** MIDDLE_NAME=2** Jas*** Ln
DOB=Seneca GENDER=29 GRADE_LEVEL=. STATE_STUDENTNUMBER=15 STUDENT_NUMBER=3000 ALERT_OTHER=26
GUARDIANEMAIL=22414 HOME_PHONE=9**1 STREET=I******* CITY=R ZIP=K***** COURSE_NUMBER=1
SECTION_NUMBER=i******@sdoc.org TERMID=. CC_SCHOOLID=9 PREVTCHNUM= TEACHERNUMBER= TCHFIRST=
TCHMID= TCHLAST= STATUS=. EMAIL= COURSE_NAME= ENGL_PROF= _ERROR_=1 _N_=1583

Shmuel
Garnet | Level 18

1) I have asked to copy the line into the window opened by clicking the </>  icon

    from the icons menu, thus the text will be delivered as is.

 

2) Did you copy the line from the input file itself?
    Are all those asterisks (********) in the original text or masked data because of          privacy?

 

3) In mail I got some of the text is displayed in BLUE color.
    Are there embedded characters in the text to control the color ?

 

4) Line 1584 doesn't contain the "IEP string. Is it in line 1583? 
    Then copy both lines to the opened window by  </>  icon.

 

My purpose is to regenerate the issue and try to overcome it.

GreggB
Pyrite | Level 9

1. I used the </> icon.
2. I copied from the log. ***** are to mask the data
3. No embedded chars

4. I copied from the LOG. Do you want a PROC  PRINT?

 

Shmuel
Garnet | Level 18

Copying from the log is bad and doesn't help. Log liens splits the data, add starting position and displays each variable with the value that was assigned to it.

 

To run a test I need the input lines as they are.

Best will be if you copy those lines from the input, using some external editor, to create a .txt file being a sample for testing.

GreggB
Pyrite | Level 9
difficult to explain..."IEP SDOC@Home" (in red) actually looks like the the red text at the bottom left. 

ID LAST_NAME FIRST_NAME MIDDLE_NAME DOB GENDER GRADE_LEVEL STATE_STUDENTNUMBER STUDENT_NUMBER ALERT_OTHER GUARDIANEMAIL HOME_PHONE STREET CITY ZIP COURSE_NUMBER SECTION_NUMBER TERMID CC_SCHOOLID PREVTCHNUM TEACHERNUMBER TCHFIRST TCHMID TCHLAST STATUS EMAIL COURSE_NAME ENGL_PROF
23509 **** Ca***** S**** 1/16/2017 M 8 5687983422 10*** IEP ************cess03@gmail.com 864-***-***8 231 *** H*** Dr M****** Rest 29664 2916C202 1 3000 30 8351 7391 A** * ****ad 1 *****ad@sdoc.org MS Science 9
23551 ******* Dan*** ****** 12/25/2007 M 7 ********* 10*** "IEP
SDOC@Home" krisneddie143@gmail.com 810-***-0*** *** ******* Ln Seneca 29672 2.92E+03 20 3000 26 1590 2587 ******er B Williams 1 jb*******@sdoc.org MS Lang Arts 9
23551 ******* Dan*** ****** 8/30/2007 M 7 5854056186 10631 "IEP
SDOC@Home" krisneddie143@gmail.com 810-558-0765 *** ******* Ln Seneca 29672 2110M700 15 3000 26 22414 9261 *****lle R K***** 1 i******@sdoc.org Mathematics 7 9


"IEP
SDOC@Home"

Tom
Super User Tom
Super User

The first note is probably the most important:

NOTE: Invalid data for ID in line 1584 1-10.

Most likely some value in line 1,583 has an embedded end of line characters so that what SAS is reading as line 1,584 is just the middle of the vlaue with the embedded end of line characters.

 

Can you remove the embedded end of line characters from the file?  If not and the values that contain the end of line characters are enclosed in double quote characters then you can fix it using one the methods explained in the other 57 gazillion versions of this question on this forum.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1247 views
  • 0 likes
  • 4 in conversation