Solved: URGENT : Import text based report missing several value

responcity · Posted 08-27-2014 08:26 AM

Hi,

I tryed to import some text based report as attach, my script to import is:

data store (drop=Key0);

infile "C:\Users\Administrator\Desktop\YULIA\sample.txt" delimiter = ' ' MISSOVER DSD LRECL=10000 pad;

retain account_number date time amount type response thestore;

/*input key1 $2-3 @;

if key1 in ('3','4','5')then input

@2 account_number2 $19.;*/

input

@2 account_number $19.

@22 Date ddmmyy8.

@31 time $4.

@36 amount $15.

@53 type $8.

@62 response $9. ;

input key0 $52-59 @;

if key0='THESTORE' then input

@61 thestore $35.;

format date date9.;

if date ne .;

run;

my problem is the output missing some observation like below :

account_number	date	time	amount	type	response	thestore
	8-Jul-85	1210	999,800	RETAIL	MORE	WALLMART MMM
0000-8888-9999-7990	8-Jul-85	1210	790,000	RETAIL	MORE	PHILLIPS MMM
0000-8888-3333-7853	8-Jul-85	1240	969,955	RETAIL	MORE	WE HAVE ALL PRODUCT MMM

the output should be like below :

account_number	date	time	amount	type	response	thestore
0000-8888-9999-8896	8-Jul-85	1210	590,000	RETAIL	MORE	TIMEZONE MMM
	8-Jul-85	1210	999,800	RETAIL	MORE	WALLMART MMM
0000-8888-9999-7990	8-Jul-85	1210	790,000	RETAIL	MORE	PHILLIPS MMM
0000-8888-3333-0697	8-Jul-85	1230	57,000	RETAIL	MORE	YOUR STORE AVAILABLE MMM
0000-8888-3333-7853	8-Jul-85	1240	969,955	RETAIL	MORE	WE HAVE ALL PRODUCT MMM

have anyone any idea whats wrong with my code that causing missing some observation?

thanks before.

Ksharp · Posted 08-27-2014 09:05 AM

OK . Assuming date would always be non-missing .

data want;
 infile 'c:\temp\sample.txt' ;
 input;
 length  account_number  date time amount type response $ 100;
 retain  account_number  date time amount type response ;
 if prxmatch('/\d\d\/\d\d\/\d\d/o',_infile_) then do;
   account_number=substr(_infile_,2,19);
   date=left(substr(_infile_,22,8));
   time=left(substr(_infile_,31,4));
   amount=left(substr(_infile_,36,15));
   type=left(substr(_infile_,51,10));
   response=left(substr(_infile_,61,9));     
 end;
 else if prxmatch('/THESTORE:/io',_infile_) then do;
   thestore=left(compbl(scan(_infile_,-1,':'))); output;
 end;

run;

Xia Keshan

View solution in original post

RW9 · Posted 08-27-2014 08:41 AM

Looks like you are getting a lot of bits in the dataset which you are dropping with the if date ne .. Have a look without that line in and you will see that the data doesn't make much sense.

So you can see, in the above, account_number is read in from the file as blank in row 2, this value is retained for the next row which has a date, and thus comes out but with missing as account number.

From my side, I would say this "report" will take a fair bit of working to get into any readable manner. It would be easier if you could get the source document. Alternatively if that is not possible then read each line in as $2000. Then post process it with further datasteps.

responcity · Posted 08-27-2014 08:47 AM

thanks for reply, I use date ne . to cleansing unnecessary data, because all transaction have date value, my problem is why account number 0000-8888-9999-8896 & 0000-8888-3333-0697 is missing from my dataset.

RW9 · Posted 08-27-2014 09:14 AM

Per my mail, when you look at what is imported the account number is missing for the first row where date is not missing, hence there is nothing to put in that variable, or retain for next records. Hence why I suggest to read in all the data as long lengths, and then post-process the information to separate out the sections. Ksharp has given a good example of the post-processing with perl regular expressions above.

Ksharp · Posted 08-27-2014 09:05 AM

OK . Assuming date would always be non-missing .

data want;
 infile 'c:\temp\sample.txt' ;
 input;
 length  account_number  date time amount type response $ 100;
 retain  account_number  date time amount type response ;
 if prxmatch('/\d\d\/\d\d\/\d\d/o',_infile_) then do;
   account_number=substr(_infile_,2,19);
   date=left(substr(_infile_,22,8));
   time=left(substr(_infile_,31,4));
   amount=left(substr(_infile_,36,15));
   type=left(substr(_infile_,51,10));
   response=left(substr(_infile_,61,9));     
 end;
 else if prxmatch('/THESTORE:/io',_infile_) then do;
   thestore=left(compbl(scan(_infile_,-1,':'))); output;
 end;

run;

Xia Keshan

responcity · Posted 08-27-2014 09:22 PM

work like a charm!! amazing, thanks KSharp..

responcity · Posted 08-27-2014 09:27 PM

Ksharp, can you elaborate more what "/\d\d\/\d\d\/\d\d/o" is stand for? thanks.

PGStats · Posted 08-27-2014 09:59 PM

"/\d\d\/\d\d\/\d\d/o" is a pattern matching string describing what you are looking for.

"/.../o" frames the pattern and instruct PRXMATCH to compile the pattern only once, instead of every time the function is called.

\d matches a single digit

\/ matches s slash

So the pattern reads Look for two digits followed by a slash, followed by two digits, followed by a slash, followed by two digits.

PG

responcity · Posted 08-27-2014 10:24 PM

Thanks for the answer PG, one more question, so the perl expressions is match two digits followed by a slash, followed by two digits, followed by a slash, followed by two digits, in my case, how come perl expression recognized blank account number for second observation like result below?

account_number	date	time	amount	type	response	thestore
0000-8888-9999-8896	8-Jul-85	1210	590,000	RETAIL	MORE	TIMEZONE MMM
	8-Jul-85	1210	999,800	RETAIL	MORE	WALLMART MMM

Thanks PG.

responcity · Posted 08-27-2014 10:37 PM

Oh how stupid of me, sorry PG, yes now I understand that KSharp look for any date contain in text that have 2 digit with slash. Thanks man, youre all awesome.

PGStats · Posted 08-27-2014 10:38 PM

The pattern is not aimed at the account number, it is looking for a date. It matches 08/07/85. - PG

PG

URGENT : Import text based report missing several value

Re: URGENT : Import text based report missing several value

Re: URGENT : Import text based report missing several value

Re: URGENT : Import text based report missing several value

Re: URGENT : Import text based report missing several value

Re: URGENT : Import text based report missing several value

Re: URGENT : Import text based report missing several value

Re: URGENT : Import text based report missing several value

Re: URGENT : Import text based report missing several value

Re: URGENT : Import text based report missing several value

Re: URGENT : Import text based report missing several value

Re: URGENT : Import text based report missing several value

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away