BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
buckeyefisher
Obsidian | Level 7

Can Pearl functions be used to extract data from notepad and populate in Excel.

Data looks like this ->

------------------------------------------------------------------------

PRODUCT

Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012

CODE

Date code 08361

RECALLING FIRM/MANUFACTURER

Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.

REASON

The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.

VOLUME OF PRODUCT IN COMMERCE

Unknown

DISTRIBUTION

MN, WI, IA, MO, ND, SD, MT

___________________________________

and then repeats in the same format with a separating line

I want data to be in Excel format

PRODUCT               CODE          RECALLING FIRM.               REASON                         VOLUME                         DISTRIBUTION

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I would start with something like this:

data test;

infile datalines truncover;

length topic $32 content $200;

input topic $32.;

if not missing(topic) and anylower(trim(topic)) = 0 then do;

    prodNo + topic = "PRODUCT";

    input content $200.;

    output;

    end;

datalines;

PRODUCT

Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012

CODE

Date code 08361

RECALLING FIRM/MANUFACTURER

Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.

REASON

The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.

VOLUME OF PRODUCT IN COMMERCE

Unknown

DISTRIBUTION

MN, WI, IA, MO, ND, SD, MT

PRODUCT

Natural pistachios packaged in 6 oz. bags, UPC 076056 09460. Recall # F-1267-2012

CODE

Date code 08362

RECALLING FIRM/MANUFACTURER

Fritz Company, Inc., Newport, MN, by e-mail on April 2, 2009. FDA initiated recall is complete.

REASON

The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.

VOLUME OF PRODUCT IN COMMERCE

Unknown

DISTRIBUTION

TX

;

proc transpose data=test out=want(drop=_name_ prodNo);

by prodNo;

id topic;

var content;

run;

PG

PG

View solution in original post

12 REPLIES 12
Reeza
Super User

If your structure is consistently like that you should look at the examples with a BASE SAS input, no Perl needed.

Take a look at Method #2

Step-by-Step Programming with Base SAS(R) Software

buckeyefisher
Obsidian | Level 7

Thanks Reeza,

but the position of the data is not fixed. Also the content varies. The only hope is looking for headers that are fixed (and upper case)

PGStats
Opal | Level 21

I would start with something like this:

data test;

infile datalines truncover;

length topic $32 content $200;

input topic $32.;

if not missing(topic) and anylower(trim(topic)) = 0 then do;

    prodNo + topic = "PRODUCT";

    input content $200.;

    output;

    end;

datalines;

PRODUCT

Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012

CODE

Date code 08361

RECALLING FIRM/MANUFACTURER

Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.

REASON

The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.

VOLUME OF PRODUCT IN COMMERCE

Unknown

DISTRIBUTION

MN, WI, IA, MO, ND, SD, MT

PRODUCT

Natural pistachios packaged in 6 oz. bags, UPC 076056 09460. Recall # F-1267-2012

CODE

Date code 08362

RECALLING FIRM/MANUFACTURER

Fritz Company, Inc., Newport, MN, by e-mail on April 2, 2009. FDA initiated recall is complete.

REASON

The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.

VOLUME OF PRODUCT IN COMMERCE

Unknown

DISTRIBUTION

TX

;

proc transpose data=test out=want(drop=_name_ prodNo);

by prodNo;

id topic;

var content;

run;

PG

PG
buckeyefisher
Obsidian | Level 7

Thanks, this sounds doable. I wasted so much time experimenting with pearl functions.

And yes, I just tested the code. Its working fine. Amazing !!!!

jakarman
Barite | Level 11

You have a perfect ordered data as indicated by the headers.  Processing data of that is one of the most powerfull features of SAS.

PGstats is giving a nice start. What is holding you off?

---->-- ja karman --<-----
buckeyefisher
Obsidian | Level 7

PG Stats - How do I handle lines between the two records  '------------------'

PGStats
Opal | Level 21

To skip those lines, require that topic contain at least one uppercase character. Replace

if not missing(topic) and anylower(trim(topic)) = 0 then do;


with


if anyupper(topic) > 0 and not missing(topic) and anylower(trim(topic)) = 0 then do;


PG

PG
buckeyefisher
Obsidian | Level 7

PG stats

I am trying to create a super heading that repeats say after 15 or 20 records

topic1 is my super heading.

somehow its not populating topic1,

data test;

infile 'ABC' truncover;

length topic1 $60 topic $60 content $1000 ;

  input topic $60.;

       

if anyupper(topic) > 0 and anydigit(topic) = 0 and anypunct(trim(topic)) and anyspace(trim(topic)) and not missing(topic) and anylower(trim(topic)) = 0 then        do;

  topic1 = topic;

end;

if anyupper(topic) > 0 and anydigit(topic) = 0 and not anypunct(trim(topic)) and not anyspace(trim(topic)) and not missing(topic) and anylower(trim(topic)) = 0 then do;

prodNo + topic = "PRODUCT";

input content $1000.;

output;

end;

PGStats
Opal | Level 21

I guess your condition for detecting super headings is not quite right. Please show examples of super headings or describe how they should be different from headings. - PG

PG
buckeyefisher
Obsidian | Level 7

PG Stats here is the super heading


there are 6 types of super heading with the same format. Number of records per super heading can vary.


RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS I

RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS II

RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS III

RECALLS AND FIELD CORRECTIONS: DEVICES - CLASS I

RECALLS AND FIELD CORRECTIONS: DEVICES - CLASS II

RECALLS AND FIELD CORRECTIONS: DEVICES - CLASS III


here is the data format


RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS I


PRODUCT

CODE

RECALLING FIRM/MANUFACTURER

REASON

VOLUME OF PRODUCT IN COMMERCE

DISTRIBUTION

PRODUCT

CODE

RECALLING FIRM/MANUFACTURER

REASON

VOLUME OF PRODUCT IN COMMERCE

DISTRIBUTION

RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS II

PRODUCT

CODE

RECALLING FIRM/MANUFACTURER

REASON

VOLUME OF PRODUCT IN COMMERCE

DISTRIBUTION

PRODUCT

CODE

RECALLING FIRM/MANUFACTURER

REASON

VOLUME OF PRODUCT IN COMMERCE

DISTRIBUTION


RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS III


etc



PGStats
Opal | Level 21

I think this would read it properly

data test;

infile 'ABC' truncover;

length topic1 $60 topic $60 content $1000 ;

retain topic1;

input topic $60.;

/* Skip separation lines and out of order non-topic lines */ 

if anyupper(topic) > 0 and not missing(topic) and anylower(trim(topic)) = 0 then do;

/* Check for super topic lines */

  if anydigit(topic) > 0 and anypunct(topic) > 0 then topic1 = topic;

  else do;

/* Read content line */

    prodNo + topic = "PRODUCT";

    input content $1000.;

    output;

    end;

end;

run;

PG

PG
jakarman
Barite | Level 11

buckeyefisher, changing a program is understanding the concepts.
Pgstats did a fast one the assumptions that:
- every header is recognized by the text "Product"
every part of information is structured by a (capped) topic indicator followed by a line with the content.

Reading this first as topic/content it can be transposed to get a table layout   

Changing conditions can break this approach.

There is a amazing log possible with the input statement.

- Assumption PRODUCT is the start of a fixed block/oreder of information you can do it like.

  There is no check on correct order or topics in that block,

  With some if-subseting and more logic that can be done

  Check for (@ holding record , / next record and more)

data wantjk (drop=header) ;
infile datalines truncover;
length header $32 ;
Length PRODUCT $200
       CODE $32
       RECALLING_FIRM_MANUFACTURER $200
       REASON $200
       VOLUME_OF_PRODUCT_IN_COMMERCE $200
       DISTRIBUTION $32
   ;      

input  header $32. ;
if header = "PRODUCT" Then do;
    input PRODUCT $200. //
          CODE $32. //
          RECALLING_FIRM_MANUFACTURER $200. //
          REASON $200.  //
          VOLUME_OF_PRODUCT_IN_COMMERCE $200. //
          DISTRIBUTION $32.
          ;
    output;
    end;

datalines;
PRODUCT
Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012
CODE
Date code 08361
RECALLING FIRM/MANUFACTURER
Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.
REASON
The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.
VOLUME OF PRODUCT IN COMMERCE
Unknown
DISTRIBUTION
MN, WI, IA, MO, ND, SD, MT

PRODUCT
Natural pistachios packaged in 6 oz. bags, UPC 076056 09460. Recall # F-1267-2012
CODE
Date code 08362
RECALLING FIRM/MANUFACTURER
Fritz Company, Inc., Newport, MN, by e-mail on April 2, 2009. FDA initiated recall is complete.
REASON
The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.
VOLUME OF PRODUCT IN COMMERCE
Unknown
DISTRIBUTION
TX
;
run;

---->-- ja karman --<-----

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1472 views
  • 0 likes
  • 4 in conversation