I am tring to import a csv with multiple columns.One of the column is basically email data,with a lot of trail mails.When i import this file ,I see that the email column data is getting separated at random places,The data from this columns starts flowing in other columns,disturbing my structure of the table.For example if i have 4 columns ,date sent,time,customer id,customer email id,and email body.Data from email body starts flowing in all the other columns on import to SAS.
Please help how to solve this problem
Your problem is most probably coming from the fact that the mail body contains line breaks, which are interpreted by SAS before everything else.
So the mail body from message 1 spills over into "virtual" records.
I had a similar problem once and ended up hacking a quick C filter program that eliminates line breaks that are contained between double quotes.
You could write a similar algorithm in SAS.
Here's my C code:
(it replaces suspicious linefeeds with the HTML <BR> tag)
#include <stdio.h>
#include <unistd.h>
main (argc,argv)
int argc;
char *argv[];
{
FILE *ifp;
FILE *ofp;
int c;
int flag;
int iflag = 0;
int oflag = 0;
char *infile = NULL;
char *outfile = NULL;
opterr = 0;
while ((c = getopt(argc,argv,"i:o:")) != -1)
switch (c) {
case 'i':
iflag = 1;
infile = optarg;
break;
case 'o':
oflag = 1;
outfile = optarg;
break;
case '?':
if (optopt == 'c')
fprintf (stderr, "Option -%c requires an argumen
t.\n", optopt);
else if (isprint (optopt))
fprintf (stderr, "Unknown option `-%c'.\n", opto
pt);
else
fprintf (stderr,
"Unknown option character `\\x%x'.\n",
optopt);
return 1;
}
if (iflag == 1) {
ifp = fopen(infile,"r");
if (ifp == NULL) {
fprintf(stderr, "could not open infile %s\n",infile);
return 2;
}
}
else {
ifp = stdin;
}
if (oflag == 1) {
ofp = fopen(outfile,"w");
if (ofp == NULL) {
fprintf(stderr, "could not open outfile %s\n",outfile);
return 3;
}
else {
fprintf(stdout,"converting %s to %s\n",infile,outfile);
}
}
else {
ofp = stdout;
}
flag = 0;
while ((c = fgetc(ifp)) != EOF ) {
if (c == 39) {
flag = abs(flag - 1);
}
if ((c == 10) && (flag == 1)) {
fputc('<',ofp);
fputc('B',ofp);
fputc('R',ofp);
fputc('>',ofp);
}
else {
fputc(c,ofp);
}
}
}
Another method: Make sure that the mail body is enclosed in quotes (or a special separator that is most unlikely to appear in the text)
After reading a line, check if the last field is enclosed by these separator characters. If not, keep reading additional lines and append the content to the body field until the closing separator is encountered.
I would agree with @Kurt_Bremser, it may be that you can just get away with changing the infile delimeter to ¬ or something that wouldn't appear in the text. Can I suggest you post the code you have tried and an example file. I don't imagine that CSV is the best format for dealing with emails - they could contain anything and be in any structure. Where is the data coming from? Is it in an Office app, you me find it easier to process it in VBA and then dump the results out to CSV with the data you want.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.