Hi All,
I am trying to import a test file of about 1000k records, Fields are correctly delimited with pipe character but some contains control characters in between.
This caused unsuccessful file import. Is there any way to load this file. Sample as below.First line with field names.
Pln_Tmpl_Assmnt_Itm_Id|Answer_Text|Guideline_Text|Plan_Template_Assessment_Id|PlanTemplateAssessmentItemCmt|Question_Text|Sequence_Nbr|Source_Created_Date|Source_Created_By|Source_Last_Updated_Date|Source_Last_Updated_By|Source_System_Code|Record_Ownership_Code|g_snapshot_start_date|g_snapshot_end_date|g_extract_sk|g_extract_name|g_current_record_ind|g_deleted_record_ind|g_hash_md5|g_insert_job_sk|g_update_job_sk|g_insert_datetime|g_update_datetime
2-XDQC6W||"Record Goal 1 in 'comments' field and the outcome this relates to in 'answer' The text in the ""comment"" field will appear verbatim in the participant plan. It should reflect the goal in the participant's own words."|2-XDQC6V||Goal 1||01OCT2013:11:55:24.000|2-KMAOFW|01OCT2013:11:55:24.000|2-KMAOFW|Siebel|2|02OCT2013:00:00:00|16OCT2013:23:59:59|32|20131002|N|N|2BCA91FA567BF629D3361CBD38BD0EEA|1334||10MAR2015:11:27:36|
3-6FER8D|2|record details in comments including whose answer is recorded (particpant or guardian/parent/carer etc)Answer of 1 - 5 where 1 = very dissatisfied and2 = dissatisfied3 = neither satified nor dissatisfied4 = satisfied5 = very satisfied|3-6FER8C||Overall how satisfied are you with your level of community involvement?|1|05NOV2013:14:22:19.000|2-G32QYC|17DEC2013:13:28:41.000|2-G32QYC|Siebel|2|30MAR2014:00:00:00|01JAN5999:00:00:00|205|20140330|Y|N|43799D96C07315900EE81FD96B874B15|1334||10MAR2015:11:27:36|
3-6FER8E||record details in comments including whose answer is recorded (particpant or guardian/parent/carer etc)
Answer of 1 - 5 where
1 = very dissatisfied and
2 = dissatisfied
3 = neither satified nor dissatisfied
4 = satisfied
5 = very satisfied|3-6FER8C||Overall how satisfied are you with the study you have completed?||05NOV2013:14:22:19.000|2-G32QYC|05NOV2013:14:22:19.000|2-G32QYC|Siebel|2|06NOV2013:00:00:00|10NOV2013:23:59:59|67|20131106|N|N|D94237279E68D5AE74D7D46E25FD309C|1334||10MAR2015:11:27:36|
3-6FER8E|||3-6FER8C|record details in comments including whose answer is recorded (particpant or guardian/parent/carer etc)
Answer of 1 - 5 where
1 = very dissatisfied and
2 = dissatisfied
3 = neither satified nor dissatisfied
4 = satisfied
5 = very satisfied|Overall how satisfied are you with the study you have completed?||05NOV2013:14:22:19.000|2-G32QYC|05NOV2013:14:22:19.000|2-G32QYC|Siebel|2|11NOV2013:00:00:00|17DEC2013:23:59:59|71|20131111|N|N|0454683AC52A5DABAF11435F28637E26|1334||10MAR2015:11:27:36|
3-6FER8E|5||3-6FER8C|record details in comments including whose answer is recorded (particpant or guardian/parent/carer etc)
Answer of 1 - 5 where
1 = very dissatisfied and
2 = dissatisfied
3 = neither satified nor dissatisfied
4 = satisfied
5 = very satisfied|Overall how satisfied are you with the study you have completed?||05NOV2013:14:22:19.000|2-G32QYC|17DEC2013:13:28:48.000|2-G32QYC|Siebel|2|18DEC2013:00:00:00|29MAR2014:23:59:59|103|20131218|N|N|C09BD225FE07B3DAACD275259216108E|1334||10MAR2015:11:27:36|
It is actually switch line character like \r\n or \n . I was able to attached the file at last.
I don't understand what you mean to Control character, SAS can import it correctly, no problem ,unless it is switch line character like \r\n or \n .
By pasting your data into the browser some of the issues could have been removed. You will need to attach the data as text file for us to be able to see what's eventually going wrong.
It's not really clear what these "control characters" are - and especially if they are different from your normal end-of-line indicators.
Eventually option "TERMSTR=" can resolve the issue. SAS(R) 9.4 Statements: Reference, Third Edition
Hi Patrick,
I am not sure how to use "TERMSTR=" option in proc import, since it will identify end-of-line indicators of different operating systems. Issue here is I am having a file which contains end-of-line indicators in between field separators. Pl. explain.
In the file you've attached I can't see any control characters within a line. I can only see CRLF at the end of a line. I assume these "control characters" simply mean that your source data is "spread out" over multiple lines.
Are you by any chance transferring the file from Unix to Windows (just a thought).
With your file layout Proc Import will very likely not work but you need to use a data step.
You write that with the delimiters everything is o.k. - so I would have expected that if I read the data just field per field only creating an observation when reading data into the last input column (FLOWOVER), everything should look right. But that's not what I'm getting (see code below) so I assume something else is going on with your source data.
It's no problem to read data from multiple source columns - but what constitutes in your data the start of a new record?
data sample;
infile 'c:\temp\Test_1.txt' dsd dlm='|' flowover lrecl=500 firstobs=2;
input
(
Pln_Tmpl_Assmnt_Itm_Id
Answer_Text Guideline_Text Plan_Template_Assessment_Id PlanTemplateAssessmentItemCmt Question_Text Sequence_Nbr
Source_Created_Date Source_Created_By Source_Last_Updated_Date Source_Last_Updated_By Source_System_Code
Record_Ownership_Code g_snapshot_start_date g_snapshot_end_date g_extract_sk g_extract_name g_current_record_ind
g_deleted_record_ind g_hash_md5 g_insert_job_sk g_update_job_sk g_insert_datetime g_update_datetime
)
(:$20.);
;
run;
Hi Patrick,
Control characters are actually end of line characters like \r\n which can be identified if using a regular characters in textpad. Apology for my wrong heading.
Since it appears in between field separators, as I think the only way to identify a new line is by counting the number of fields per line, Which is 24 separated by pipes, That means after 24 fields are read by using 'I' as the only field separator a new line should be read for the next 24.
Also If I manually remove the end of lines I can perfectly create a 24 field observation.
Problem is how to implement this in a data step.
Thank you for your help.
Personally I would send the file back and say its not a valid delimited file. Delimited files should only have line breaks at the end of a line.
I also agree with Patrick in that you should not be relying on a guessing procedure to import your data.
I have also looked at this kind a problem some time time. SAS support provided me the code in this post, which works ok. You can pre-process the file, i.e. read it character by character, then output when not a control code:
However I would still suggest that you send the file back, thats what I did eventually and the vendor fixed the issue their end.
Thanks for the post RW9,I'll try out reading character by character and try to build 24 fields per line.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.