URGENT : Import text based report missing several value

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

URGENT : Import text based report missing several value

Hi,

I tryed to import some text based report as attach, my script to import is:

data store  (drop=Key0);

   infile "C:\Users\Administrator\Desktop\YULIA\sample.txt" delimiter = ' ' MISSOVER DSD LRECL=10000 pad;

   retain account_number date time amount type response thestore;

/*input key1 $2-3 @;

                if key1 in ('3','4','5')then input

                @2      account_number2            $19.;*/

input

                @2      account_number          $19.

                @22     Date                  ddmmyy8.

                @31     time                 $4.

                @36     amount                  $15.

                @53     type                  $8.

                @62     response               $9.       ;

input key0 $52-59 @;

                if key0='THESTORE' then input

                @61    thestore            $35.;

format date date9.;

if date ne .;

run;

my problem is the output missing some observation like below :

account_numberdatetimeamounttyperesponsethestore
8-Jul-851210999,800RETAILMOREWALLMART MMM
0000-8888-9999-79908-Jul-851210790,000RETAILMOREPHILLIPS MMM
0000-8888-3333-78538-Jul-851240969,955RETAILMOREWE HAVE ALL PRODUCT MMM

the output should be like below :

account_numberdatetimeamounttyperesponsethestore
0000-8888-9999-88968-Jul-851210590,000RETAILMORETIMEZONE    MMM
8-Jul-851210999,800RETAILMOREWALLMART MMM
0000-8888-9999-79908-Jul-851210790,000RETAILMOREPHILLIPS MMM
0000-8888-3333-06978-Jul-85123057,000RETAILMOREYOUR STORE AVAILABLE MMM
0000-8888-3333-78538-Jul-851240969,955RETAILMOREWE HAVE ALL PRODUCT MMM

have anyone any idea whats wrong with my code that causing missing some observation?

thanks before.

Attachment

Accepted Solutions
Solution
‎08-27-2014 09:05 AM
Grand Advisor
Posts: 9,584

Re: URGENT : Import text based report missing several value

OK . Assuming date would always be non-missing .

data want;
 infile 'c:\temp\sample.txt' ;
 input;
 length  account_number  date time amount type response $ 100;
 retain  account_number  date time amount type response ;
 if prxmatch('/\d\d\/\d\d\/\d\d/o',_infile_) then do;
   account_number=substr(_infile_,2,19);
   date=left(substr(_infile_,22,8));
   time=left(substr(_infile_,31,4));
   amount=left(substr(_infile_,36,15));
   type=left(substr(_infile_,51,10));
   response=left(substr(_infile_,61,9));     
 end;
 else if prxmatch('/THESTORE:/io',_infile_) then do;
   thestore=left(compbl(scan(_infile_,-1,':'))); output;
 end;

run;

Xia Keshan

View solution in original post


All Replies
Esteemed Advisor
Esteemed Advisor
Posts: 7,228

Re: URGENT : Import text based report missing several value

Looks like you are getting a lot of bits in the dataset which you are dropping with the if date ne ..  Have a look without that line in and you will see that the data doesn't make much sense.

Capture.PNG

So you can see, in the above, account_number is read in from the file as blank in row 2, this value is retained for the next row which has a date, and thus comes out but with missing as account number.

From my side, I would say this "report" will take a fair bit of working to get into any readable manner.  It would be easier if you could get the source document.  Alternatively if that is not possible then read each line in as $2000.  Then post process it with further datasteps.

Occasional Contributor
Posts: 14

Re: URGENT : Import text based report missing several value

thanks for reply, I use date ne . to cleansing unnecessary data, because all transaction have date value, my problem is why account number 0000-8888-9999-8896 & 0000-8888-3333-0697 is missing from my dataset.

Esteemed Advisor
Esteemed Advisor
Posts: 7,228

Re: URGENT : Import text based report missing several value

Per my mail, when you look at what is imported the account number is missing for the first row where date is not missing, hence there is nothing to put in that variable, or retain for next records.  Hence why I suggest to read in all the data as long lengths, and then post-process the information to separate out the sections.  Ksharp has given a good example of the post-processing with perl regular expressions above.

Solution
‎08-27-2014 09:05 AM
Grand Advisor
Posts: 9,584

Re: URGENT : Import text based report missing several value

OK . Assuming date would always be non-missing .

data want;
 infile 'c:\temp\sample.txt' ;
 input;
 length  account_number  date time amount type response $ 100;
 retain  account_number  date time amount type response ;
 if prxmatch('/\d\d\/\d\d\/\d\d/o',_infile_) then do;
   account_number=substr(_infile_,2,19);
   date=left(substr(_infile_,22,8));
   time=left(substr(_infile_,31,4));
   amount=left(substr(_infile_,36,15));
   type=left(substr(_infile_,51,10));
   response=left(substr(_infile_,61,9));     
 end;
 else if prxmatch('/THESTORE:/io',_infile_) then do;
   thestore=left(compbl(scan(_infile_,-1,':'))); output;
 end;

run;

Xia Keshan

Occasional Contributor
Posts: 14

Re: URGENT : Import text based report missing several value

work like a charm!! amazing, thanks KSharp..

Occasional Contributor
Posts: 14

Re: URGENT : Import text based report missing several value

Ksharp, can you elaborate more what "/\d\d\/\d\d\/\d\d/o" is stand for? thanks.

Respected Advisor
Posts: 4,608

Re: URGENT : Import text based report missing several value

"/\d\d\/\d\d\/\d\d/o" is a pattern matching string describing what you are looking for.

"/.../o" frames the pattern and instruct PRXMATCH to compile the pattern only once, instead of every time the function is called.

\d matches a single digit

\/ matches s slash

So the pattern reads Look for two digits followed by a slash, followed by two digits, followed by a slash, followed by two digits.

PG

PG
Occasional Contributor
Posts: 14

Re: URGENT : Import text based report missing several value

Thanks for the answer PG, one more question, so the perl expressions is match two digits followed by a slash, followed by two digits, followed by a slash, followed by two digits, in my case, how come perl expression recognized blank account number for second observation like result below?

account_numberdatetimeamounttyperesponsethestore
0000-8888-9999-88968-Jul-851210590,000RETAILMORETIMEZONE    MMM
8-Jul-851210999,800RETAILMOREWALLMART MMM

Thanks PG.

Occasional Contributor
Posts: 14

Re: URGENT : Import text based report missing several value

Oh how stupid of me, sorry PG, yes now I understand that KSharp look for any date contain in text that have 2 digit with slash. Thanks man, youre all awesome.

Respected Advisor
Posts: 4,608

Re: URGENT : Import text based report missing several value

The pattern is not aimed at the account number, it is looking for a date. It matches 08/07/85. - PG

PG
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 354 views
  • 3 likes
  • 4 in conversation