Help using Base SAS procedures

Help for reading messy data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Help for reading messy data

Hi,

I am trying to create a data record which contains non aligned fields. My have written the code as in attached editor file.

However I am not able to correctly render the ‘Details’ variable as shown in image below (it's only correct as shown in first observation)

Can somebody please suggest modifications in the code to read it correctly? (Please see attached lign code)

Your help is much appreciated.

Thanks.

Attachment

Accepted Solutions
Solution
‎05-04-2015 11:24 PM
Super Contributor
Posts: 275

Re: Help for reading messy data

I did this as Patrick suggested, maybe it will be helpful.

data joseph_balances;

input name$ 1-6 @13 amount  dollar7. /string $50.//;

datail=prxchange('s/(.*)\s(?=\d{1,2}\/\d{1,2}\/\d+).*/$1/i',-1,string);

date=put(input(prxchange('s/.*\s(\d{1,2}\/\d{1,2}\/\d+).*/$1/',-1,string),anydtdte.),mmddyy8.);

time=put(input(prxchange('s/.*\s(\d+:\d+ (AM|PM))/$1/',-1,string),time8.),time8.);

drop string;

cards;

joseph paid $18.96

DTE March 15 5/4/15 12:19 PM

for: All

joseph paid $25.00

Internet March 15 4/10/15 8:54 PM

for: All

joseph paid $26.18

DTE Feb 2/24/15 7:28 AM

for: All

joseph paid $610.00

Rent March 15 2/21/15 10:43 AM

for: All

joseph paid $25.00

Internet Jan 15 2/11/15 12:11 PM

for: All

joseph paid $610.00

Rent Feb 15 2/3/15 2:42 PM

for: All

joseph paid $26.54

DTE Jan 15 1/21/15 11:55 AM

for: All

joseph paid $25.00

Internet December 14 1/15/15 3:27 PM

for: All

joseph paid $72.00

Book 1/8/15 3:27 PM

for: All

joseph paid $23.47

DTE Dec 13 1/4/15 3:47 PM

for: All

joseph paid $610.00

Rent Jan 2015 1/1/15 8:28 PM

for: All

joseph paid $25.00

Internet November 12/2/14 8:28 PM

for: All

joseph paid $610.00

Rent Dec 12/2/14 8:28 PM

for: All

joseph paid $23.77

DTE November 11/19/14 8:03 PM

for: All

joseph paid $610.00

Rent November 11/2/14 1:08 PM

for: All

edith gave $102.00

Debt settlement 10/28/14 8:51 PM

to: joseph

joseph paid $55.00

Internet Sep Oct 10/27/14 4:35 PM

for: All

joseph paid $24.59

DTE oct 10/23/14 10:05 PM

for: All

joseph paid $102.00

Licence 10/1/14 3:39 PM

for: edith

joseph paid $610.00

Rent oct 10/1/14 12:08 PM

for: All

joseph paid $23.63

DTE sept 9/19/14 3:45 PM

for: All

;

proc print;

run;

View solution in original post


All Replies
Respected Advisor
Posts: 3,896

Re: Help for reading messy data

Internet March 15 4/10/15 8:54 PM

I believe one way which could work is to read the whole 3rd line as a string, then find the position of the 3rd last word in this string.


Once you've got this position read the first bit of the string into your character variable and the second bit (starting from the 3rd last word) as a datetime value.



Solution
‎05-04-2015 11:24 PM
Super Contributor
Posts: 275

Re: Help for reading messy data

I did this as Patrick suggested, maybe it will be helpful.

data joseph_balances;

input name$ 1-6 @13 amount  dollar7. /string $50.//;

datail=prxchange('s/(.*)\s(?=\d{1,2}\/\d{1,2}\/\d+).*/$1/i',-1,string);

date=put(input(prxchange('s/.*\s(\d{1,2}\/\d{1,2}\/\d+).*/$1/',-1,string),anydtdte.),mmddyy8.);

time=put(input(prxchange('s/.*\s(\d+:\d+ (AM|PM))/$1/',-1,string),time8.),time8.);

drop string;

cards;

joseph paid $18.96

DTE March 15 5/4/15 12:19 PM

for: All

joseph paid $25.00

Internet March 15 4/10/15 8:54 PM

for: All

joseph paid $26.18

DTE Feb 2/24/15 7:28 AM

for: All

joseph paid $610.00

Rent March 15 2/21/15 10:43 AM

for: All

joseph paid $25.00

Internet Jan 15 2/11/15 12:11 PM

for: All

joseph paid $610.00

Rent Feb 15 2/3/15 2:42 PM

for: All

joseph paid $26.54

DTE Jan 15 1/21/15 11:55 AM

for: All

joseph paid $25.00

Internet December 14 1/15/15 3:27 PM

for: All

joseph paid $72.00

Book 1/8/15 3:27 PM

for: All

joseph paid $23.47

DTE Dec 13 1/4/15 3:47 PM

for: All

joseph paid $610.00

Rent Jan 2015 1/1/15 8:28 PM

for: All

joseph paid $25.00

Internet November 12/2/14 8:28 PM

for: All

joseph paid $610.00

Rent Dec 12/2/14 8:28 PM

for: All

joseph paid $23.77

DTE November 11/19/14 8:03 PM

for: All

joseph paid $610.00

Rent November 11/2/14 1:08 PM

for: All

edith gave $102.00

Debt settlement 10/28/14 8:51 PM

to: joseph

joseph paid $55.00

Internet Sep Oct 10/27/14 4:35 PM

for: All

joseph paid $24.59

DTE oct 10/23/14 10:05 PM

for: All

joseph paid $102.00

Licence 10/1/14 3:39 PM

for: edith

joseph paid $610.00

Rent oct 10/1/14 12:08 PM

for: All

joseph paid $23.63

DTE sept 9/19/14 3:45 PM

for: All

;

proc print;

run;

Respected Advisor
Posts: 3,896

Re: Help for reading messy data

Just as a variation to the code already provided using different RegEx functions.

data joseph_balances;

  if _N_=1 then

      _re=prxparse('/(.+)\b(\d+\/\d+\/\d+ \d+:\d+ (PM|AM))/i');

  retain _re;

  infile datalines truncover;

  input

        name$ 1-6 @13 amount  dollar7.

      / string $50.//;

  if prxmatch(_re, string) then

    do;

       detail=prxposn(_re, 1, string);

       dttm=input(prxposn(_re, 2, string),mdyampm25.2);

    end;

  format dttm datetime20.;

  drop string;

  cards;

......

Occasional Contributor
Posts: 6

Re: Help for reading messy data

Thanks,

This was too difficult for me.

Your time and efforts are much appreciated.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 305 views
  • 5 likes
  • 3 in conversation