- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I am using SAS Enterprise Guide (Ver 8.3.8.206) and I have imported a delimited file, which imported with no problems. However, in the column with free text there are sometimes tilde characters that are followed by text. If the tilde character was alone in the cell then it was read - but rows that had a tilde character preceding text have been ignored by SAS. Is this because SAS considers the character to be a logical not? Can I tell SAS to read these rows or do I need to data clean using different software to detect any special characters before I can use SAS? The dataset exceeds Excel's limits.
The data looked something like this:
EVENT_ID|FREE_TEXT|SECOND TEXT|CODE
35434554654|~Description of free text|Second text box|23
Thanks for your help
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello
Thanks for your help, that worked perfectly. I had a date variable so I had to add a formatted informat to the code. It took a while to generate enough code to view the rows with the tilde issue. I still have a few problems with the dataset but I’ve started a new question thread.
data want ; infile "D:\file_location.txt" dlm="|" dsd firstobs=2 truncover; length EVENT_ID $20. /* adapt if necessary */ Num_1 4. Num_2 5. CHAR_1 $5. CHAR_2 $32767. CHAR_3 $32767. CHAR_4 $32767. CHAR_5 $32767. DATE_1 8. UNSURE $3. NUM_3 5.; informat DATE_1 mmddyy10.; format DATE_1 yymmddn8. ; input EVENT_ID Num_1 Num_2 Num_1 CHAR_1 CHAR_2 CHAR_3 CHAR_4 CHAR_5 DATE_1 UNSURE NUM_3 ; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try this code:
data want (compress=yes);
infile "path_to_your_file" dlm="|" dsd firstobs=2 truncover;
length
event_id $11 /* adapt if necessary */
free_text $32767
second_text $32767
code $2 /* adapt if necessary */
;
input
event_id
free_text
second_text
code
;
run;
If this does not work as intended, post the log by copy/pasting it into a window opened with this button:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello
Thanks for your help, that worked perfectly. I had a date variable so I had to add a formatted informat to the code. It took a while to generate enough code to view the rows with the tilde issue. I still have a few problems with the dataset but I’ve started a new question thread.
data want ; infile "D:\file_location.txt" dlm="|" dsd firstobs=2 truncover; length EVENT_ID $20. /* adapt if necessary */ Num_1 4. Num_2 5. CHAR_1 $5. CHAR_2 $32767. CHAR_3 $32767. CHAR_4 $32767. CHAR_5 $32767. DATE_1 8. UNSURE $3. NUM_3 5.; informat DATE_1 mmddyy10.; format DATE_1 yymmddn8. ; input EVENT_ID Num_1 Num_2 Num_1 CHAR_1 CHAR_2 CHAR_3 CHAR_4 CHAR_5 DATE_1 UNSURE NUM_3 ; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
(compress=yes) only gave 74 rows of data
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@river1 wrote:
I've added in: (compress=char) this solved the problem of the long processing time.
(compress=yes) only gave 74 rows of data
No difference there, CHAR and YES are the same, see COMPRESS= Data Set Option
The COMPRESS= data set option only influences the space requirement (and, indirectly, the time needed to write it) of the resulting dataset. It does NOT have an impact on the number of observations read from the external file; this can be influenced by the TERMSTR= option of the INFILE statement.
If two different versions of code give results that puzzle you, post the complete logs as already mentioned.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My mistake compress=yes ran faster than compress=char. I think it was a problem with my computer when I ran the initial code
compress=yes Log: NOTE: DATA statement used (Total process time): real time 4:33.32 cpu time 2:53.39 compress=char
log: NOTE: DATA statement used (Total process time): real time 4:49.08 cpu time 3:33.51
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As already stated, CHAR and YES are identical. Different runtimes point to different load on the SAS computer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What do you mean by "imported"?
Did you use some GUI tool to import a file? What code did it generate?
Did you write some code yourself?
What did the LOG look like?
The string you posted does not cause any trouble for SAS to read.
data test;
infile datalines dsd dlm='|' truncover firstobs=2;
input EVENT_ID :$11. FREE_TEXT :$40. SECOND_TEXT :$30. CODE ;
datalines;
EVENT_ID|FREE_TEXT|SECOND TEXT|CODE
35434554654|~Description of free text|Second text box|23
;
Result
Obs EVENT_ID FREE_TEXT SECOND_TEXT CODE 1 35434554654 ~Description of free text Second text box 23