Long text field that needs to be parse into different records

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Long text field that needs to be parse into different records

I have a text field that has course objectives in it that are numbered. They are variable length text data between increasing numbers that do not have the same number of objectives. Is there any way to parse this data with SAS to make a new record for each numbered objective?  I have included the proc contents of the file & a sample of what the text field looks like.

 

 

The SAS System

 

The CONTENTS Procedure

Data Set Name

WORK.OBJECTIVES

Observations

709

Member Type

DATA

Variables

5

Engine

V9

Indexes

0

Created

02/02/2016 12:11:14

Observation Length

1045

Last Modified

02/02/2016 12:11:14

Deleted Observations

0

Protection

 

Compressed

NO

Data Set Type

 

Sorted

NO

Label

 

 

 

Data Representation

WINDOWS_32

 

 

Encoding

wlatin1 Western (Windows)

 

 

 

 

Alphabetic List of Variables and Attributes

#

Variable

Type

Len

Format

Informat

Label

2

SCRSYLO_CRSE_NUMB

Char

5

$5.

$5.

SCRSYLO_CRSE_NUMB

5

SCRSYLO_LEARNING_OBJECTIVES

Char

1024

$1024.

$1024.

SCRSYLO_LEARNING_OBJECTIVES

1

SCRSYLO_SUBJ_CODE

Char

4

$4.

$4.

SCRSYLO_SUBJ_CODE

3

SCRSYLO_TERM_CODE_EFF

Char

6

$6.

$6.

SCRSYLO_TERM_CODE_EFF

4

SCRSYLO_TERM_CODE_END

Char

6

$6.

$6.

SCRSYLO_TERM_CODE_END

 

Sample of scrsylo_learning_objectives:

  1. Apply the generally accepted accounting principles and techniques to the process of analyzing and recording financial data.
  2. Prepare the end of period financial statements for service and merchandising enterprises. 
  3. Demonstrate an understanding in greater depth of accounting for assets, current liabilities, and payroll.

Accepted Solutions
Solution
‎02-03-2016 10:24 AM
Respected Advisor
Posts: 4,609

Re: Long text field that needs to be parse into different records

Pattern matching can be useful here

data text;
infile datalines truncover;
input txt $1000.;
datalines;
1. Apply the generally accepted etc. data.  2. Prepare the end...  3. Demonstrate, current liabilities, and payroll.
1) Do this. And that too. 2) And nothing else. 4) But don't forget anything.
;

data want;
length objNo 8 Objective $512;
if not prx1 then prx1 + prxparse("/\d+.+?\.+\s*(?=(\d|$))/");
set text;
start = 1; stop = -1; objNo = 1;
call prxnext(prx1, start, stop, txt, pos, len); 
if pos = 0 then do; /* Pattern not found, assume single objective */
    objective = txt;
    output;
    end;
else
    do objNo = 1 by 1 while (pos > 0);
        objective = substr(txt, pos, len);
        output;
        call prxnext(prx1, start, stop, txt, pos, len); 
        end;
drop prx1 start stop pos len txt;
run;

proc print data=want noobs; run;
PG

View solution in original post


All Replies
Grand Advisor
Posts: 17,438

Re: Long text field that needs to be parse into different records

Can the text field ever have numbers?

 

And/or are they always a single sentence? If so, you could try SCAN() with period as a delimiter.

Occasional Contributor
Posts: 16

Re: Long text field that needs to be parse into different records

Not that I know of.
Respected Advisor
Posts: 3,775

Re: Long text field that needs to be parse into different records

If they are always numberd nicely as in your example this will work.

filename FT15F001 temp lrecl=512;
data string;
   infile FT15F001 truncover;
   input string $512.;
   length line $256;
   do i = 1 to 10;
      s = findw(string,cats(i,.));
      e = findw(string,cats(i+1,.));
      *put s= e=;
      if s eq 0 then leave;
      if e eq 0 then line=substrn(string,s);
      else line=substrn(string,s,e-1);
      output;
      end;
   drop string;
   parmcards4;
1. Apply the generally accepted accounting principles and techniques to the process of analyzing and recording financial data.  2. Prepare the end of period financial statements for service and merchandising enterprises.  3. Demonstrate an understanding in greater depth of accounting for assets, current liabilities, and payroll.
1. Apply the generally accepted accand recording financial data.  2. Prepare the eial statements for service and merchandising enterprises. 
;;;;
   run;
proc print;
   run;

Capture.PNG

Occasional Contributor
Posts: 16

Re: Long text field that needs to be parse into different records

How do you modify this for text that will need to be pulled in from access and should be in a SAS table?
Respected Advisor
Posts: 3,775

Re: Long text field that needs to be parse into different records

Do you mean can the data source be SET instead of INPUT?

 

 

data x;
   set y;
   do the the work
   run;

 

Solution
‎02-03-2016 10:24 AM
Respected Advisor
Posts: 4,609

Re: Long text field that needs to be parse into different records

Pattern matching can be useful here

data text;
infile datalines truncover;
input txt $1000.;
datalines;
1. Apply the generally accepted etc. data.  2. Prepare the end...  3. Demonstrate, current liabilities, and payroll.
1) Do this. And that too. 2) And nothing else. 4) But don't forget anything.
;

data want;
length objNo 8 Objective $512;
if not prx1 then prx1 + prxparse("/\d+.+?\.+\s*(?=(\d|$))/");
set text;
start = 1; stop = -1; objNo = 1;
call prxnext(prx1, start, stop, txt, pos, len); 
if pos = 0 then do; /* Pattern not found, assume single objective */
    objective = txt;
    output;
    end;
else
    do objNo = 1 by 1 while (pos > 0);
        objective = substr(txt, pos, len);
        output;
        call prxnext(prx1, start, stop, txt, pos, len); 
        end;
drop prx1 start stop pos len txt;
run;

proc print data=want noobs; run;
PG
Respected Advisor
Posts: 3,775

Re: Long text field that needs to be parse into different records

@PGStats thinking about your nice RegEX I thought perhaps instead of capturing the sentence you could edit in a delimiter and SCAN.  Same but different so to speak.

filename FT15F001 temp lrecl=512;
data string;
   infile FT15F001 truncover;
   input string $char512.-l;
   length line $256;
   if _n_ eq 1 then rx = prxparse("s/(\d+.+?\.)/"||'ff'x||"$1/");
   retain rx;
   string = prxchange(rx,-1,string);
   do i = 1 by 1;
      line = scan(string,i,'ff'x);
      if missing(line) then leave;
      output;
      end;
   parmcards4;
1. Apply the generally accepted accounting principles and techniques to the process of analyzing and recording financial data.  2. Prepare the end of period financial statements for service and merchandising enterprises.  3. Demonstrate an understanding in greater depth of accounting for assets, current liabilities, and payroll.
   1) Apply the generally accepted accand recording financial data.  2. Prepare the eial statements for service and merchandising enterprises. 
This one has no numbers
;;;;
   run;
proc print;
   run;
Respected Advisor
Posts: 4,609

Re: Long text field that needs to be parse into different records

Good idea @data_null__, that way you don't have to detect the end of each objective with the pattern. I would prefer

 

      if not missing(line) then output;
      else if i > 1 then leave;

in the output loop for the case where the string has leading blanks.

PG
Respected Advisor
Posts: 3,775

Re: Long text field that needs to be parse into different records

Super!
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 456 views
  • 0 likes
  • 4 in conversation