DATA Step, Macro, Functions and more

Reading unsorted data from notepad file into SAS

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Reading unsorted data from notepad file into SAS

 

Dear Members,

 

 

I have a big text file and it contains 3 tables. The records in the file are unsorted, the records in the file are in line with the column name separated by space. The tables are repeating themselves until the end. I want to import the data from that notepad file under correct table into SAS. I want to read the records and column names from the text file and put them under correct table In SAS. I tried through INFILE and I was successful in importing the data into SAS, but since Columns are unsorted and tables are repeating themselves therefore it is harder to keep the records under correct table in SAS. I am a beginner to SAS, any help would be greatly appreciated. The sample file is attached for your reference.

 

Thanks

Ather


Accepted Solutions
Solution
‎07-22-2016 06:07 AM
SAS Super FREQ
Posts: 683

Re: Reading unsorted data from notepad file into SAS

Hi

 

You will need some time to do this, here is a basic program that I hope gets you started. See the comments in the code for an explanation what the codes does.

 

data want;
  infile "c:\temp\incomingTestFile.txt" truncover;

  * read a records;
  input;

  * check for type of table;
  if find(_infile_, "Table 1") then do;

    *
    * depending on the table start reading values
    * the @"someText" positions you rigth after the text
    * the @28 positions on col 28
    * varname informat, reads the value into a variable, carefull about the length
    * the slash "/" indicates "read a new line"
    *;
    input
      /
      @"Acct Number" acctNumber $30.
      @"ZIP Code" ZIPCode $30.
      /
      @"Floor Limit Indicator" floorLimitInd $8.
      @"State/Prov. Code" state_prov_code $20.
      /
      /
      /
      /
      /
      @28 purchaseDate yymmdd8.
    ;

    *
    * take out any control chars like tab etc. and take away leading and trailing spaces
    *;
    array xchar{*} _character_;

    drop i;
    do i = 1 to dim(xchar);
      xchar{i} = strip(compress(xchar{i},,"C"));
    end;

    * write to the output data sets;
    output;
  end;

  format 
    purchaseDate date9.
  ;
run;

Bruno

View solution in original post


All Replies
Super User
Posts: 5,257

Re: Reading unsorted data from notepad file into SAS

You could succeed if you invest quite some amount of time and effort by parsing this report style of file.
It would be substantially easier if you could get a data file instead from the source.
Data never sleeps
Occasional Contributor
Posts: 16

Re: Reading unsorted data from notepad file into SAS

Thank you for answering the question, we dont have source, we only get this file on daily basis in this format. The data is not stored anywhere except in this file and we want the data to be available with us to perform different analytics. This is the reason of importing the file in SAS. Thanks

 

Super User
Super User
Posts: 7,405

Re: Reading unsorted data from notepad file into SAS

Why are you trying to import a report file?  Go back to the source data - i.e. the data used to produce that report.  Otherwise you have quite a chunk of work on your hands, import each row, process it out etc.

Occasional Contributor
Posts: 16

Re: Reading unsorted data from notepad file into SAS

Thank you for answering the question, we dont have source, we only get this file on daily basis in this format. The data is not stored anywhere except in this file and we want the data to be available with us to perform different analytics. This is the reason of importing the file in SAS. Thanks

Super User
Super User
Posts: 7,405

Re: Reading unsorted data from notepad file into SAS

So you get this file regularly, but the supplier cannot provide either raw data access or a proper data transfer file?  I would go and check the contract you have with the data provider - this is a report file, it is not a data transfer.  You can try coding round it, but at each run you will likely be re-coding and re-doing the process.  It really is in your best interest and the supplier to agree a proper data transfer process. Even if that process it to drop the data to CSV at the same time the text file is created.

Super User
Posts: 6,945

Re: Reading unsorted data from notepad file into SAS

I'd do it like that:

one data step, with N tables in the DATA statement; use KEEP to only put the proper variables in the respective datasets

INFILE statement with end=done option

RETAIN all variables that appear in the KEEP lists for the output tables

also RETAIN a flag variable that holds which table is currently being read; set it to 0 initially.

read input lines into sufficently long string variable; discard lines that hold no real data (like those containing PAGE x) with subsetting if

when a "Table X" line is encountered:

 - if the flag variable is not zero (indicating that we are not starting the first table), OUTPUT to corresponding dataset (use a SELECT block)

 - set the flag variable accordingly

a SELECT(flag); block, where you have parsing statements for each table you are currently reading

when the "done" variable is set, do one final output of the currently read table (as you did when a new table started)

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 16

Re: Reading unsorted data from notepad file into SAS

Thanks Kurt for the suggestion, could you please elaborate more through SAS code, would be grateful to you. Thanks

Super User
Posts: 6,945

Re: Reading unsorted data from notepad file into SAS

Start easy:

data test (keep=acctno);
infile "your_test_file" truncover;
length input_line $200;
input input_line $200.;
if index(input_line,'Account') > 0 then do;
  acctno = input(substr(input_line,X,Y),Z.);
  output;
end;
run;

where X,Y,Z denote positions and length of the accountnumber within the report line.

Once you get this working, you already have one of the parsing statements. Then write more and more parsing statements.

The INDEX function will also be used to detect the lines that start a table, which will then help you create the SELECT blocks.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎07-22-2016 06:07 AM
SAS Super FREQ
Posts: 683

Re: Reading unsorted data from notepad file into SAS

Hi

 

You will need some time to do this, here is a basic program that I hope gets you started. See the comments in the code for an explanation what the codes does.

 

data want;
  infile "c:\temp\incomingTestFile.txt" truncover;

  * read a records;
  input;

  * check for type of table;
  if find(_infile_, "Table 1") then do;

    *
    * depending on the table start reading values
    * the @"someText" positions you rigth after the text
    * the @28 positions on col 28
    * varname informat, reads the value into a variable, carefull about the length
    * the slash "/" indicates "read a new line"
    *;
    input
      /
      @"Acct Number" acctNumber $30.
      @"ZIP Code" ZIPCode $30.
      /
      @"Floor Limit Indicator" floorLimitInd $8.
      @"State/Prov. Code" state_prov_code $20.
      /
      /
      /
      /
      /
      @28 purchaseDate yymmdd8.
    ;

    *
    * take out any control chars like tab etc. and take away leading and trailing spaces
    *;
    array xchar{*} _character_;

    drop i;
    do i = 1 to dim(xchar);
      xchar{i} = strip(compress(xchar{i},,"C"));
    end;

    * write to the output data sets;
    output;
  end;

  format 
    purchaseDate date9.
  ;
run;

Bruno

Super User
Posts: 9,682

Re: Reading unsorted data from notepad file into SAS

It is really not easy. There are too many scenarios you need to consider about :


filename xx '/folders/myfolders/incoming test file.txt';
data temp;
infile xx lrecl=2000 truncover length=len;
input x $varying2000. len;
if find(x,'INCOMING INTERCHANGE') or
   find(x,'SYSTEM DATE') or
   find(x,'RUN NO') or
   missing(x) then delete;
length tname $ 100;
retain tname;
if scan(x,-2)='Table' then do;tname=strip(scan(x,-1,'-'));delete;end;
run;

data temp1;
 set temp;
 temp1=strip(tranwrd(substr(x,1,55),'09'x,' '));
 p1=find(temp1,'  ');
 name1=substr(temp1,1,p1);
 value1=strip(substr(temp1,p1));
 
 temp2=strip(tranwrd(substr(x,56),'09'x,' '));
 p2=find(temp2,'  ');
 name2=substr(temp2,1,p2);
 value2=strip(substr(temp2,p2));

keep name1 value1 name2 value2 tname temp1 temp2;
run;
data want;
 set temp1;
 name=name1;value=value1;output;
 name=name2;value=value2;output;
keep name value tname;
run;

proc print noobs label;run;





OUTPUT:
tname	name	value
Table 1	Acct Number	xxxxxxxxxxxxxxxxxxx
Table 1	ZIP Code	xxxxx
Table 1	Floor Limit Indicator	x
Table 1	State/Prov. Code	AB
Table 1	CRB/Exception File Ind	 
Table 1	Requested Payment Service	 
Table 1	PCAS Indicator	x
Table 1	Number of Payment Forms	 
Table 1	Reference Nbr	xxxxxxxxxxxxxxxxxxxxxxx
Table 1	Usage Code	1
Table 1	Business ID	xxx
Table 1	Reason Code	00
Table 1	Purchase Date	20160119
Table 1	Flag	0
Table 1	Destination Amount	xxxxxxxxxxxx
Table 1	Characteristics Ind	N
Table 1	Currency Code	840
Table 1	Code	123456
Table 1	Source Amount	000000000100
Table 1	Capability	2
Table 1	Source Currency Code	840
Table 1	ID Method	1
Table 1	Name	ABCDEFGHIJKLMNOPQRSTUV
Table 1	Flag	 
Table 1	City	xxx-xxx-xxxx
Table 1	Entry Mode	01
Table 1	Country Code	US
Table 1	Processing Date	20160121
Table 1	Category Code	1234
Table 1	Attribute	0
Table 2	Format Code	 
Table 2	Trace Number	 
Table 2	Assurance Level	 
Table 2	Terminal Ind	 
Table 2	Reference Nbr	000000
Table 2	Indicator	 
Table 2	Indicator	 
Table 2	Development Field	0
Table 2	Message Text	 
Table 2	Response Code	 
Table 2	Condition Ind	 
Table 2	Source Code 5	 
Table 2	Program Indicator	 
Table 2	Id Format	 
Table 2	Charge	x
Table 2	Selection	0
Table 2	ID	XXXXXXXXXXXXXXX
Table 2	Payment Count	 
Table 2	Terminal ID	 
Table 2	Identifier	123456789
Table 2	Fee	000000000000 Ca
Table 2	shback	000000000
Table 2	Pymt Ind 7	 
Table 2	Condition Code	 
Table 2	Chargeback Ind	 
Table 2	Environment	 
Table 3	Identifier	123456789 Volume Indicator
Table 3	 	 
Table 3	Amount	000000000000
Table 3	Goods Indicator	 
Table 3	Currency Cd	123
Table 3	Verification Val	 
Table 3	Response Cd	 
Table 3	Fee Amount	000000000000000
Table 3	Code	 
Table 3	Fee Sign	 
Table 3	ID Rsn	 
Table 3	Curr Ex Rate	00000000
Table 3	Processing Code	 
Table 3	Curr Ex Rate	00000000
Table 3	Rights Ind	 
Table 3	ISA Amt	000000000000
Table 3	Clearing Seq Nbr 00	 
Table 3	Product Id	F
Table 3	Clearing Seq Cnt 00	 
Table 3	Program ID	 
Table 3	Auth Data	 
Table 3	Conv Ind	 
Table 3	Total Authorized Amount	000000000000
Table 3	Type Identification	 
Table 3	Information Ind	 
Table 3	Qualified Indicator	 
Table 3	Telephone Number	 
Table 3	Token	0000000000000000
Table 3	Data Indicator	 
Table 3	Result Code	 
Table 1	Acct Number	xxxxxxxxxxxxxxxxxxx
Table 1	ZIP Code	xxxxx
Table 1	Floor Limit Indicator	x
Table 1	State/Prov. Code	AB
Table 1	CRB/Exception File Ind	 
Table 1	Requested Payment Service	 
Table 1	PCAS Indicator	x
Table 1	Number of Payment Forms	 
Table 1	Reference Nbr	xxxxxxxxxxxxxxxxxxxxxxx
Table 1	Usage Code	1
Table 1	Business ID	xxx
Table 1	Reason Code	00
Table 1	Purchase Date	20160119
Table 1	Flag	0
Table 1	Destination Amount	xxxxxxxxxxxx
Table 1	Characteristics Ind	N
Table 1	Currency Code	840
Table 1	Code	123456
Table 1	Source Amount	000000000100
Table 1	Capability	2
Table 1	Source Currency Code	840
Table 1	ID Method	1
Table 1	Name	ABCDEFGHIJKLMNOPQRSTUV
Table 1	Flag	 
Table 1	City	xxx-xxx-xxxx
Table 1	Entry Mode	01
Table 1	Country Code	US Processing Dat
Table 1	e	20160121
Table 1	Category Code	1234
Table 1	Attribute	0
Table 2	Format Code	 
Table 2	Trace Number	 
Table 2	Assurance Level	 
Table 2	Terminal Ind	 
Table 2	Reference Nbr	000000
Table 2	Indicator	 
Table 2	Indicator	 
Table 2	Development Field	0
Table 2	Message Text	 
Table 2	Response Code	 
Table 2	Condition Ind	 
Table 2	Source Code 5	 
Table 2	Program Indicator	 
Table 2	Id Format	 
Table 2	Charge	x
Table 2	Selection	0
Table 2	ID	XXXXXXXXXXXXXXX
Table 2	Payment Count	 
Table 2	Terminal ID	 
Table 2	Identifier	123456789
Table 2	Fee	000000000000 Ca
Table 2	shback	000000000
Table 2	Pymt Ind 7	 
Table 2	Condition Code	 
Table 2	Chargeback Ind	 
Table 2	Environment	 
Table 3	Identifier	123456789 Volume Indicator
Table 3	 	 
Table 3	Amount	000000000000
Table 3	Goods Indicator	 
Table 3	Currency Cd	123
Table 3	Verification Val	 
Table 3	Response Cd	 
Table 3	Fee Amount	000000000000000
Table 3	Code	 
Table 3	Fee Sign	 
Table 3	ID Rsn	 
Table 3	Curr Ex Rate	00000000
Table 3	Processing Code	 
Table 3	Curr Ex Rate	00000000
Table 3	Rights Ind	 
Table 3	ISA Amt	000000000000
Table 3	Clearing Seq Nbr 00	 
Table 3	Product Id	F
Table 3	Clearing Seq Cnt 00	 
Table 3	Program ID	 
Table 3	Auth Data	 
Table 3	Conv Ind	 
Table 3	Total Authorized Amount	000000000000
Table 3	Type Identification	 
Table 3	Information Ind	 
Table 3	Qualified Indicator	 
Table 3	Telephone Number	 
Table 3	Token	 
Table 3	Additional Data Indicator	 
Table 3	CVV2 Result Code	 
Table 3	Chargeback Ind	 
Table 3	Environment	 


☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 469 views
  • 0 likes
  • 6 in conversation