BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
responcity
Calcite | Level 5

Hi,

I tryed to import some text based report as attach, my script to import is:

data store  (drop=Key0);

   infile "C:\Users\Administrator\Desktop\YULIA\sample.txt" delimiter = ' ' MISSOVER DSD LRECL=10000 pad;

   retain account_number date time amount type response thestore;

/*input key1 $2-3 @;

                if key1 in ('3','4','5')then input

                @2      account_number2            $19.;*/

input

                @2      account_number          $19.

                @22     Date                  ddmmyy8.

                @31     time                 $4.

                @36     amount                  $15.

                @53     type                  $8.

                @62     response               $9.       ;

input key0 $52-59 @;

                if key0='THESTORE' then input

                @61    thestore            $35.;

format date date9.;

if date ne .;

run;

my problem is the output missing some observation like below :

account_numberdatetimeamounttyperesponsethestore
8-Jul-851210999,800RETAILMOREWALLMART MMM
0000-8888-9999-79908-Jul-851210790,000RETAILMOREPHILLIPS MMM
0000-8888-3333-78538-Jul-851240969,955RETAILMOREWE HAVE ALL PRODUCT MMM

the output should be like below :

account_numberdatetimeamounttyperesponsethestore
0000-8888-9999-88968-Jul-851210590,000RETAILMORETIMEZONE    MMM
8-Jul-851210999,800RETAILMOREWALLMART MMM
0000-8888-9999-79908-Jul-851210790,000RETAILMOREPHILLIPS MMM
0000-8888-3333-06978-Jul-85123057,000RETAILMOREYOUR STORE AVAILABLE MMM
0000-8888-3333-78538-Jul-851240969,955RETAILMOREWE HAVE ALL PRODUCT MMM

have anyone any idea whats wrong with my code that causing missing some observation?

thanks before.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

OK . Assuming date would always be non-missing .

data want;
 infile 'c:\temp\sample.txt' ;
 input;
 length  account_number  date time amount type response $ 100;
 retain  account_number  date time amount type response ;
 if prxmatch('/\d\d\/\d\d\/\d\d/o',_infile_) then do;
   account_number=substr(_infile_,2,19);
   date=left(substr(_infile_,22,8));
   time=left(substr(_infile_,31,4));
   amount=left(substr(_infile_,36,15));
   type=left(substr(_infile_,51,10));
   response=left(substr(_infile_,61,9));     
 end;
 else if prxmatch('/THESTORE:/io',_infile_) then do;
   thestore=left(compbl(scan(_infile_,-1,':'))); output;
 end;

run;

Xia Keshan

View solution in original post

10 REPLIES 10
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Looks like you are getting a lot of bits in the dataset which you are dropping with the if date ne ..  Have a look without that line in and you will see that the data doesn't make much sense.

Capture.PNG

So you can see, in the above, account_number is read in from the file as blank in row 2, this value is retained for the next row which has a date, and thus comes out but with missing as account number.

From my side, I would say this "report" will take a fair bit of working to get into any readable manner.  It would be easier if you could get the source document.  Alternatively if that is not possible then read each line in as $2000.  Then post process it with further datasteps.

responcity
Calcite | Level 5

thanks for reply, I use date ne . to cleansing unnecessary data, because all transaction have date value, my problem is why account number 0000-8888-9999-8896 & 0000-8888-3333-0697 is missing from my dataset.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Per my mail, when you look at what is imported the account number is missing for the first row where date is not missing, hence there is nothing to put in that variable, or retain for next records.  Hence why I suggest to read in all the data as long lengths, and then post-process the information to separate out the sections.  Ksharp has given a good example of the post-processing with perl regular expressions above.

Ksharp
Super User

OK . Assuming date would always be non-missing .

data want;
 infile 'c:\temp\sample.txt' ;
 input;
 length  account_number  date time amount type response $ 100;
 retain  account_number  date time amount type response ;
 if prxmatch('/\d\d\/\d\d\/\d\d/o',_infile_) then do;
   account_number=substr(_infile_,2,19);
   date=left(substr(_infile_,22,8));
   time=left(substr(_infile_,31,4));
   amount=left(substr(_infile_,36,15));
   type=left(substr(_infile_,51,10));
   response=left(substr(_infile_,61,9));     
 end;
 else if prxmatch('/THESTORE:/io',_infile_) then do;
   thestore=left(compbl(scan(_infile_,-1,':'))); output;
 end;

run;

Xia Keshan

responcity
Calcite | Level 5

work like a charm!! amazing, thanks KSharp..

responcity
Calcite | Level 5

Ksharp, can you elaborate more what "/\d\d\/\d\d\/\d\d/o" is stand for? thanks.

PGStats
Opal | Level 21

"/\d\d\/\d\d\/\d\d/o" is a pattern matching string describing what you are looking for.

"/.../o" frames the pattern and instruct PRXMATCH to compile the pattern only once, instead of every time the function is called.

\d matches a single digit

\/ matches s slash

So the pattern reads Look for two digits followed by a slash, followed by two digits, followed by a slash, followed by two digits.

PG

PG
responcity
Calcite | Level 5

Thanks for the answer PG, one more question, so the perl expressions is match two digits followed by a slash, followed by two digits, followed by a slash, followed by two digits, in my case, how come perl expression recognized blank account number for second observation like result below?

account_numberdatetimeamounttyperesponsethestore
0000-8888-9999-88968-Jul-851210590,000RETAILMORETIMEZONE    MMM
8-Jul-851210999,800RETAILMOREWALLMART MMM

Thanks PG.

responcity
Calcite | Level 5

Oh how stupid of me, sorry PG, yes now I understand that KSharp look for any date contain in text that have 2 digit with slash. Thanks man, youre all awesome.

PGStats
Opal | Level 21

The pattern is not aimed at the account number, it is looking for a date. It matches 08/07/85. - PG

PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 877 views
  • 3 likes
  • 4 in conversation