DATA Step, Macro, Functions and more

Import csv

Reply
New Contributor
Posts: 3

Import csv

I am tring to import a csv with multiple columns.One of the column is basically email data,with a lot of trail mails.When i import this file ,I see that the email column data is getting separated at random places,The data from this columns starts flowing in other columns,disturbing my structure of the table.For example if i have 4 columns ,date sent,time,customer id,customer email id,and email body.Data from email body starts flowing in all the other columns on import to SAS.

 

Please help how to solve this problem

Super User
Posts: 6,946

Re: Import csv

Your problem is most probably coming from the fact that the mail body contains line breaks, which are interpreted by SAS before everything else.

So the mail body from message 1 spills over into "virtual" records.

I had a similar problem once and ended up hacking a quick C filter program that eliminates line breaks that are contained between double quotes.

You could write a similar algorithm in SAS.

 

Here's my C code:

(it replaces suspicious linefeeds with the HTML <BR> tag)


#include <stdio.h>
#include <unistd.h>

main (argc,argv)
int argc;
char *argv[];
{
        FILE *ifp;
        FILE *ofp;
        int c;
        int flag;
        int iflag = 0;
        int oflag = 0;
        char *infile = NULL;
        char *outfile = NULL;
        opterr = 0;
        while ((c = getopt(argc,argv,"iSmiley Surprised:")) != -1)
                switch (c) {
                        case 'i':
                                iflag = 1;
                                infile = optarg;
                                break;
                        case 'o':
                                oflag = 1;
                                outfile = optarg;
                                break;
                        case '?':
                                if (optopt == 'c')
                                fprintf (stderr, "Option -%c requires an argumen
t.\n", optopt);
                                else if (isprint (optopt))
                                fprintf (stderr, "Unknown option `-%c'.\n", opto
pt);
                                else
                                fprintf (stderr,
                                "Unknown option character `\\x%x'.\n",
                                optopt);
                                return 1;
                }

        if (iflag == 1) {
                ifp = fopen(infile,"r");
                if (ifp == NULL) {
                        fprintf(stderr, "could not open infile %s\n",infile);
                        return 2;
                }
        }
        else {
                ifp = stdin;
        }
        if (oflag == 1) {
                ofp = fopen(outfile,"w");
                if (ofp == NULL) {
                        fprintf(stderr, "could not open outfile %s\n",outfile);
                        return 3;
                }
                else {
                        fprintf(stdout,"converting %s to %s\n",infile,outfile);
                }
        }
        else {
                ofp = stdout;
        }

        flag = 0;
        while ((c = fgetc(ifp)) != EOF ) {
                if (c == 39) {
                        flag = abs(flag - 1);
                }
                if ((c == 10) && (flag == 1)) {
                        fputc('<',ofp);
                        fputc('B',ofp);
                        fputc('R',ofp);
                        fputc('>',ofp);
                }
                else {
                        fputc(c,ofp);
                }
        }
}

 

Another method: Make sure that the mail body is enclosed in quotes (or a special separator that is most unlikely to appear in the text)

After reading a line, check if the last field is enclosed by these separator characters. If not, keep reading additional lines and append the content to the body field until the closing separator is encountered.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,407

Re: Import csv

I would agree with @KurtBremser, it may be that you can just get away with changing the infile delimeter to ¬ or something that wouldn't appear in the text.  Can I suggest you post the code you have tried and an example file.  I don't imagine that CSV is the best format for dealing with emails - they could contain anything and be in any structure.  Where is the data coming from?  Is it in an Office app, you me find it easier to process it in VBA and then dump the results out to CSV with the data you want.  

Ask a Question
Discussion stats
  • 2 replies
  • 161 views
  • 0 likes
  • 3 in conversation