DATA Step, Macro, Functions and more

Pearl functions for data extraction to Excel

Accepted Solution Solved
Reply
Contributor
Posts: 63
Accepted Solution

Pearl functions for data extraction to Excel

Can Pearl functions be used to extract data from notepad and populate in Excel.

Data looks like this ->

------------------------------------------------------------------------

PRODUCT

Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012

CODE

Date code 08361

RECALLING FIRM/MANUFACTURER

Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.

REASON

The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.

VOLUME OF PRODUCT IN COMMERCE

Unknown

DISTRIBUTION

MN, WI, IA, MO, ND, SD, MT

___________________________________

and then repeats in the same format with a separating line

I want data to be in Excel format

PRODUCT               CODE          RECALLING FIRM.               REASON                         VOLUME                         DISTRIBUTION

Thanks


Accepted Solutions
Solution
‎07-22-2014 11:00 PM
Respected Advisor
Posts: 4,646

Re: Pearl functions for data extraction to Excel

I would start with something like this:

data test;

infile datalines truncover;

length topic $32 content $200;

input topic $32.;

if not missing(topic) and anylower(trim(topic)) = 0 then do;

    prodNo + topic = "PRODUCT";

    input content $200.;

    output;

    end;

datalines;

PRODUCT

Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012

CODE

Date code 08361

RECALLING FIRM/MANUFACTURER

Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.

REASON

The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.

VOLUME OF PRODUCT IN COMMERCE

Unknown

DISTRIBUTION

MN, WI, IA, MO, ND, SD, MT

PRODUCT

Natural pistachios packaged in 6 oz. bags, UPC 076056 09460. Recall # F-1267-2012

CODE

Date code 08362

RECALLING FIRM/MANUFACTURER

Fritz Company, Inc., Newport, MN, by e-mail on April 2, 2009. FDA initiated recall is complete.

REASON

The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.

VOLUME OF PRODUCT IN COMMERCE

Unknown

DISTRIBUTION

TX

;

proc transpose data=test out=want(drop=_name_ prodNo);

by prodNo;

id topic;

var content;

run;

PG

PG

View solution in original post


All Replies
Super User
Posts: 17,824

Re: Pearl functions for data extraction to Excel

If your structure is consistently like that you should look at the examples with a BASE SAS input, no Perl needed.

Take a look at Method #2

Step-by-Step Programming with Base SAS(R) Software

Contributor
Posts: 63

Re: Pearl functions for data extraction to Excel

Thanks Reeza,

but the position of the data is not fixed. Also the content varies. The only hope is looking for headers that are fixed (and upper case)

Solution
‎07-22-2014 11:00 PM
Respected Advisor
Posts: 4,646

Re: Pearl functions for data extraction to Excel

I would start with something like this:

data test;

infile datalines truncover;

length topic $32 content $200;

input topic $32.;

if not missing(topic) and anylower(trim(topic)) = 0 then do;

    prodNo + topic = "PRODUCT";

    input content $200.;

    output;

    end;

datalines;

PRODUCT

Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012

CODE

Date code 08361

RECALLING FIRM/MANUFACTURER

Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.

REASON

The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.

VOLUME OF PRODUCT IN COMMERCE

Unknown

DISTRIBUTION

MN, WI, IA, MO, ND, SD, MT

PRODUCT

Natural pistachios packaged in 6 oz. bags, UPC 076056 09460. Recall # F-1267-2012

CODE

Date code 08362

RECALLING FIRM/MANUFACTURER

Fritz Company, Inc., Newport, MN, by e-mail on April 2, 2009. FDA initiated recall is complete.

REASON

The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.

VOLUME OF PRODUCT IN COMMERCE

Unknown

DISTRIBUTION

TX

;

proc transpose data=test out=want(drop=_name_ prodNo);

by prodNo;

id topic;

var content;

run;

PG

PG
Contributor
Posts: 63

Re: Pearl functions for data extraction to Excel

Thanks, this sounds doable. I wasted so much time experimenting with pearl functions.

And yes, I just tested the code. Its working fine. Amazing !!!!

Valued Guide
Posts: 3,208

Re: Pearl functions for data extraction to Excel

You have a perfect ordered data as indicated by the headers.  Processing data of that is one of the most powerfull features of SAS.

PGstats is giving a nice start. What is holding you off?

---->-- ja karman --<-----
Contributor
Posts: 63

Re: Pearl functions for data extraction to Excel

PG Stats - How do I handle lines between the two records  '------------------'

Respected Advisor
Posts: 4,646

Re: Pearl functions for data extraction to Excel

To skip those lines, require that topic contain at least one uppercase character. Replace

if not missing(topic) and anylower(trim(topic)) = 0 then do;


with


if anyupper(topic) > 0 and not missing(topic) and anylower(trim(topic)) = 0 then do;


PG

PG
Contributor
Posts: 63

Re: Pearl functions for data extraction to Excel

PG stats

I am trying to create a super heading that repeats say after 15 or 20 records

topic1 is my super heading.

somehow its not populating topic1,

data test;

infile 'ABC' truncover;

length topic1 $60 topic $60 content $1000 ;

  input topic $60.;

       

if anyupper(topic) > 0 and anydigit(topic) = 0 and anypunct(trim(topic)) and anyspace(trim(topic)) and not missing(topic) and anylower(trim(topic)) = 0 then        do;

  topic1 = topic;

end;

if anyupper(topic) > 0 and anydigit(topic) = 0 and not anypunct(trim(topic)) and not anyspace(trim(topic)) and not missing(topic) and anylower(trim(topic)) = 0 then do;

prodNo + topic = "PRODUCT";

input content $1000.;

output;

end;

Respected Advisor
Posts: 4,646

Re: Pearl functions for data extraction to Excel

I guess your condition for detecting super headings is not quite right. Please show examples of super headings or describe how they should be different from headings. - PG

PG
Contributor
Posts: 63

Re: Pearl functions for data extraction to Excel

PG Stats here is the super heading


there are 6 types of super heading with the same format. Number of records per super heading can vary.


RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS I

RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS II

RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS III

RECALLS AND FIELD CORRECTIONS: DEVICES - CLASS I

RECALLS AND FIELD CORRECTIONS: DEVICES - CLASS II

RECALLS AND FIELD CORRECTIONS: DEVICES - CLASS III


here is the data format


RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS I


PRODUCT

CODE

RECALLING FIRM/MANUFACTURER

REASON

VOLUME OF PRODUCT IN COMMERCE

DISTRIBUTION

PRODUCT

CODE

RECALLING FIRM/MANUFACTURER

REASON

VOLUME OF PRODUCT IN COMMERCE

DISTRIBUTION

RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS II

PRODUCT

CODE

RECALLING FIRM/MANUFACTURER

REASON

VOLUME OF PRODUCT IN COMMERCE

DISTRIBUTION

PRODUCT

CODE

RECALLING FIRM/MANUFACTURER

REASON

VOLUME OF PRODUCT IN COMMERCE

DISTRIBUTION


RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS III


etc



Respected Advisor
Posts: 4,646

Re: Pearl functions for data extraction to Excel

I think this would read it properly

data test;

infile 'ABC' truncover;

length topic1 $60 topic $60 content $1000 ;

retain topic1;

input topic $60.;

/* Skip separation lines and out of order non-topic lines */ 

if anyupper(topic) > 0 and not missing(topic) and anylower(trim(topic)) = 0 then do;

/* Check for super topic lines */

  if anydigit(topic) > 0 and anypunct(topic) > 0 then topic1 = topic;

  else do;

/* Read content line */

    prodNo + topic = "PRODUCT";

    input content $1000.;

    output;

    end;

end;

run;

PG

PG
Valued Guide
Posts: 3,208

Re: Pearl functions for data extraction to Excel

buckeyefisher, changing a program is understanding the concepts.
Pgstats did a fast one the assumptions that:
- every header is recognized by the text "Product"
every part of information is structured by a (capped) topic indicator followed by a line with the content.

Reading this first as topic/content it can be transposed to get a table layout   

Changing conditions can break this approach.

There is a amazing log possible with the input statement.

- Assumption PRODUCT is the start of a fixed block/oreder of information you can do it like.

  There is no check on correct order or topics in that block,

  With some if-subseting and more logic that can be done

  Check for (@ holding record , / next record and more)

data wantjk (drop=header) ;
infile datalines truncover;
length header $32 ;
Length PRODUCT $200
       CODE $32
       RECALLING_FIRM_MANUFACTURER $200
       REASON $200
       VOLUME_OF_PRODUCT_IN_COMMERCE $200
       DISTRIBUTION $32
   ;      

input  header $32. ;
if header = "PRODUCT" Then do;
    input PRODUCT $200. //
          CODE $32. //
          RECALLING_FIRM_MANUFACTURER $200. //
          REASON $200.  //
          VOLUME_OF_PRODUCT_IN_COMMERCE $200. //
          DISTRIBUTION $32.
          ;
    output;
    end;

datalines;
PRODUCT
Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012
CODE
Date code 08361
RECALLING FIRM/MANUFACTURER
Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.
REASON
The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.
VOLUME OF PRODUCT IN COMMERCE
Unknown
DISTRIBUTION
MN, WI, IA, MO, ND, SD, MT

PRODUCT
Natural pistachios packaged in 6 oz. bags, UPC 076056 09460. Recall # F-1267-2012
CODE
Date code 08362
RECALLING FIRM/MANUFACTURER
Fritz Company, Inc., Newport, MN, by e-mail on April 2, 2009. FDA initiated recall is complete.
REASON
The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.
VOLUME OF PRODUCT IN COMMERCE
Unknown
DISTRIBUTION
TX
;
run;

---->-- ja karman --<-----
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 518 views
  • 0 likes
  • 4 in conversation