BookmarkSubscribeRSS Feed
Pj1989
Calcite | Level 5

I am tring to import a csv with multiple columns.One of the column is basically email data,with a lot of trail mails.When i import this file ,I see that the email column data is getting separated at random places,The data from this columns starts flowing in other columns,disturbing my structure of the table.For example if i have 4 columns ,date sent,time,customer id,customer email id,and email body.Data from email body starts flowing in all the other columns on import to SAS.

 

Please help how to solve this problem

2 REPLIES 2
Kurt_Bremser
Super User

Your problem is most probably coming from the fact that the mail body contains line breaks, which are interpreted by SAS before everything else.

So the mail body from message 1 spills over into "virtual" records.

I had a similar problem once and ended up hacking a quick C filter program that eliminates line breaks that are contained between double quotes.

You could write a similar algorithm in SAS.

 

Here's my C code:

(it replaces suspicious linefeeds with the HTML <BR> tag)


#include <stdio.h>
#include <unistd.h>

main (argc,argv)
int argc;
char *argv[];
{
        FILE *ifp;
        FILE *ofp;
        int c;
        int flag;
        int iflag = 0;
        int oflag = 0;
        char *infile = NULL;
        char *outfile = NULL;
        opterr = 0;
        while ((c = getopt(argc,argv,"i:o:")) != -1)
                switch (c) {
                        case 'i':
                                iflag = 1;
                                infile = optarg;
                                break;
                        case 'o':
                                oflag = 1;
                                outfile = optarg;
                                break;
                        case '?':
                                if (optopt == 'c')
                                fprintf (stderr, "Option -%c requires an argumen
t.\n", optopt);
                                else if (isprint (optopt))
                                fprintf (stderr, "Unknown option `-%c'.\n", opto
pt);
                                else
                                fprintf (stderr,
                                "Unknown option character `\\x%x'.\n",
                                optopt);
                                return 1;
                }

        if (iflag == 1) {
                ifp = fopen(infile,"r");
                if (ifp == NULL) {
                        fprintf(stderr, "could not open infile %s\n",infile);
                        return 2;
                }
        }
        else {
                ifp = stdin;
        }
        if (oflag == 1) {
                ofp = fopen(outfile,"w");
                if (ofp == NULL) {
                        fprintf(stderr, "could not open outfile %s\n",outfile);
                        return 3;
                }
                else {
                        fprintf(stdout,"converting %s to %s\n",infile,outfile);
                }
        }
        else {
                ofp = stdout;
        }

        flag = 0;
        while ((c = fgetc(ifp)) != EOF ) {
                if (c == 39) {
                        flag = abs(flag - 1);
                }
                if ((c == 10) && (flag == 1)) {
                        fputc('<',ofp);
                        fputc('B',ofp);
                        fputc('R',ofp);
                        fputc('>',ofp);
                }
                else {
                        fputc(c,ofp);
                }
        }
}

 

Another method: Make sure that the mail body is enclosed in quotes (or a special separator that is most unlikely to appear in the text)

After reading a line, check if the last field is enclosed by these separator characters. If not, keep reading additional lines and append the content to the body field until the closing separator is encountered.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I would agree with @Kurt_Bremser, it may be that you can just get away with changing the infile delimeter to ¬ or something that wouldn't appear in the text.  Can I suggest you post the code you have tried and an example file.  I don't imagine that CSV is the best format for dealing with emails - they could contain anything and be in any structure.  Where is the data coming from?  Is it in an Office app, you me find it easier to process it in VBA and then dump the results out to CSV with the data you want.  

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1099 views
  • 0 likes
  • 3 in conversation