BookmarkSubscribeRSS Feed
KLMQ
Calcite | Level 5

Hi, 

 

Anyone can help to solve my problem? i have daily text files with 2k+ pages and i need to capture all fields in the txt file i.e

Application no, Application Name, Branch, Address 1, 2 ,3, Gender, Marital Status, ID No, DOB, Application Date, Page.

All these fields need to be convert to SAS dataset and the process will need to be repeat until last pages.

 

Sample data

 

Report Name: Personal Information  Date printed:12/02/2020                               Page:1

XYZ Enterprise                           Branch:100                                         Process Date:12/02/2020                

 

Application No:               1234567890

Application Name:          XxYyyZzz AAA BBB

Address 1:                       1, Xmas street 2,

              2:                       New York

              3:                       USA

 

Report Name: Personal Information  Date printed:12/02/2020                              Page:2

XYZ Enterprise                           Branch:100                                         Process Date:12/02/2020  

 

Gender:                                     F                                       ID No:                               987654321

Marital Status:                           M                                      DOB:                                    03/03/1967

 

 

Report Name: Personal Information  Date printed:12/02/2020                              Page:3

 

XYZ Enterprise                           Branch:100                                         Process Date:12/02/2020  

 

Application Date:                         03/12/2019

 

2 REPLIES 2
japelin
Rhodochrosite | Level 12

This is a primitive description method. How about the following program?

 

 

filename get "c:\temp\sampledata.txt";
data worktable;
  length RepName $80
         AppNo $10
         AppName $16
         DOPRNT 8
         Branch 8
         Address1 $80
         Address2 $80
         Address3 $80
         Gender $1 
         Marital $1 
         IDNo $10 
         DOB 8
         DOPRCS 8
         AppDate 8
         Page 8
         n 8
         ;
  format 
         DOPRNT mmddyy10.
         DOB mmddyy10.
         DOPRCS mmddyy10.
         AppDate mmddyy10.
         ;
  array pos{3} 8;
  retain _all_;
  infile get;

  input;
  n=_n_;
  if find(_infile_,"Report Name:",'i')=1 then do;
    call missing(RepName);
    call missing(AppNo);
    call missing(AppName);
    call missing(DOPRNT);
    call missing(Branch);
    call missing(Address1);
    call missing(Address2);
    call missing(Address3);
    call missing(Gender);
    call missing(Marital);
    call missing(IDNo);
    call missing(DOB);
    call missing(DOPRCS);
    call missing(Page);
    pos1=length("Report Name:");
    pos2=find(_infile_,"Date printed:",'i');
    pos3=length("Date printed:");
    RepName=strip(substr(_infile_
                        ,pos1+1
                        ,pos2-pos1-1));
    if pos2>1 then do;
      DOPRNT=input(strip(substr(_infile_,pos2+pos3,10)),ddmmyy10.);
    end;
    
    pos2=find(_infile_,"Page:",'i');
    pos3=length("Page:");
    if pos2>1 then do;
      Page=input(strip(substr(_infile_,pos2+pos3)),best.);
    end;
  end;else
  if find(_infile_,"Branch:",'i')>1 then do;
    pos2=find(_infile_,"Branch:",'i');
    pos3=length("Branch:");
    Branch=input(strip(substr(_infile_,pos2+pos3,12)),best.);
    
    pos2=find(_infile_,"Process Date:",'i');
    pos3=length("Process Date:");
    if pos2>1 then do;
      DOPRCS=input(strip(substr(_infile_,pos2+pos3,10)),ddmmyy10.);
    end;
  end;else
  if find(_infile_,"Application No:",'i')=1 then do;
    AppNo=strip(substr(_infile_,length("Application No:")+1));
  end;else
  if find(_infile_,"Application Name:",'i')=1 then do;
    AppName=strip(substr(_infile_,length("Application Name:")+1));
  end;else
  if find(_infile_,"Application Date:",'i')=1 then do;
    AppDate=input(strip(substr(_infile_,length("Application Date:")+1)),ddmmyy10.);
  end;else
  if find(_infile_,"Gender:",'i')=1 then do;

    pos1=length("Gender:");
    pos2=find(_infile_,"ID No:",'i');
    pos3=length("ID No:");

    Gender=strip(substr(_infile_,pos1+1,pos2-1));
    IDNo=strip(substr(_infile_,pos2+pos3));
  end;else
  if find(_infile_,"Marital Status:",'i')=1 then do;

    pos1=length("Marital Status:");
    pos2=find(_infile_,"DOB:",'i');
    pos3=length("DOB:");

    Marital=strip(substr(_infile_,pos1+1,pos2-1));
    DOB=input(strip(substr(_infile_,pos2+pos3)),mmddyy10.);
  end;else
  if find(_infile_,"Address 1:",'i')=1 then do;
    Address1=strip(substr(_infile_,length("Address 1:")+1));
  end;else
  if find(_infile_," 2:",'i')>1 then do;
    pos2=find(_infile_," 2:",'i');
    pos3=length(" 2:");
    Address2=strip(substr(_infile_,pos2+pos3));
  end;else
  if find(_infile_," 3:",'i')>1 then do;
    pos2=find(_infile_," 3:",'i');
    pos3=length(" 3:");
    Address3=strip(substr(_infile_,pos2+pos3));
  end;
  drop pos:;
run;
proc sort data=worktable;
  by n page;
run;
data get;
  set worktable;
  by page;
  if last.page;
  drop n;
run;
ballardw
Super User

Please paste examples of a text file into a code box opened on the forum with the {I} icon. The main message windows will reformat pasted text to remove certain forms of white space and may insert HTML tags that make the text quite different than is actually in your file.

 

You should also include a minimum of two complete records, three would be better.

Describe exactly how we know that the block of data is for a new record or the end of a record.

Is there a standard number of lines that each record occupies or is it variable? If this content varies then you really need to show examples of different layouts. Since you show a field labeled "Report Name " I am very concerned that there might be other reports in the same file with different layout.

 

It also would help to provide

1) expected variable names to create

2) properties for the variables such as maximum length

3) which repeated pieces of information shown need to be in the resulting data such as Report Name, Date Printed or Page number.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 319 views
  • 0 likes
  • 3 in conversation