BookmarkSubscribeRSS Feed
Pj1989
Calcite | Level 5

I am tring to import a csv with multiple columns.One of the column is basically email data,with a lot of trail mails.When i import this file ,I see that the email column data is getting separated at random places,The data from this columns starts flowing in other columns,disturbing my structure of the table.For example if i have 4 columns ,date sent,time,customer id,customer email id,and email body.Data from email body starts flowing in all the other columns on import to SAS.

 

Please help how to solve this problem

2 REPLIES 2
Kurt_Bremser
Super User

Your problem is most probably coming from the fact that the mail body contains line breaks, which are interpreted by SAS before everything else.

So the mail body from message 1 spills over into "virtual" records.

I had a similar problem once and ended up hacking a quick C filter program that eliminates line breaks that are contained between double quotes.

You could write a similar algorithm in SAS.

 

Here's my C code:

(it replaces suspicious linefeeds with the HTML <BR> tag)


#include <stdio.h>
#include <unistd.h>

main (argc,argv)
int argc;
char *argv[];
{
        FILE *ifp;
        FILE *ofp;
        int c;
        int flag;
        int iflag = 0;
        int oflag = 0;
        char *infile = NULL;
        char *outfile = NULL;
        opterr = 0;
        while ((c = getopt(argc,argv,"i:o:")) != -1)
                switch (c) {
                        case 'i':
                                iflag = 1;
                                infile = optarg;
                                break;
                        case 'o':
                                oflag = 1;
                                outfile = optarg;
                                break;
                        case '?':
                                if (optopt == 'c')
                                fprintf (stderr, "Option -%c requires an argumen
t.\n", optopt);
                                else if (isprint (optopt))
                                fprintf (stderr, "Unknown option `-%c'.\n", opto
pt);
                                else
                                fprintf (stderr,
                                "Unknown option character `\\x%x'.\n",
                                optopt);
                                return 1;
                }

        if (iflag == 1) {
                ifp = fopen(infile,"r");
                if (ifp == NULL) {
                        fprintf(stderr, "could not open infile %s\n",infile);
                        return 2;
                }
        }
        else {
                ifp = stdin;
        }
        if (oflag == 1) {
                ofp = fopen(outfile,"w");
                if (ofp == NULL) {
                        fprintf(stderr, "could not open outfile %s\n",outfile);
                        return 3;
                }
                else {
                        fprintf(stdout,"converting %s to %s\n",infile,outfile);
                }
        }
        else {
                ofp = stdout;
        }

        flag = 0;
        while ((c = fgetc(ifp)) != EOF ) {
                if (c == 39) {
                        flag = abs(flag - 1);
                }
                if ((c == 10) && (flag == 1)) {
                        fputc('<',ofp);
                        fputc('B',ofp);
                        fputc('R',ofp);
                        fputc('>',ofp);
                }
                else {
                        fputc(c,ofp);
                }
        }
}

 

Another method: Make sure that the mail body is enclosed in quotes (or a special separator that is most unlikely to appear in the text)

After reading a line, check if the last field is enclosed by these separator characters. If not, keep reading additional lines and append the content to the body field until the closing separator is encountered.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I would agree with @Kurt_Bremser, it may be that you can just get away with changing the infile delimeter to ¬ or something that wouldn't appear in the text.  Can I suggest you post the code you have tried and an example file.  I don't imagine that CSV is the best format for dealing with emails - they could contain anything and be in any structure.  Where is the data coming from?  Is it in an Office app, you me find it easier to process it in VBA and then dump the results out to CSV with the data you want.  

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 817 views
  • 0 likes
  • 3 in conversation