Can Pearl functions be used to extract data from notepad and populate in Excel.
Data looks like this ->
------------------------------------------------------------------------
PRODUCT
Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012
CODE
Date code 08361
RECALLING FIRM/MANUFACTURER
Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.
REASON
The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.
VOLUME OF PRODUCT IN COMMERCE
Unknown
DISTRIBUTION
MN, WI, IA, MO, ND, SD, MT
___________________________________
and then repeats in the same format with a separating line
I want data to be in Excel format
PRODUCT CODE RECALLING FIRM. REASON VOLUME DISTRIBUTION
Thanks
I would start with something like this:
data test;
infile datalines truncover;
length topic $32 content $200;
input topic $32.;
if not missing(topic) and anylower(trim(topic)) = 0 then do;
prodNo + topic = "PRODUCT";
input content $200.;
output;
end;
datalines;
PRODUCT
Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012
CODE
Date code 08361
RECALLING FIRM/MANUFACTURER
Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.
REASON
The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.
VOLUME OF PRODUCT IN COMMERCE
Unknown
DISTRIBUTION
MN, WI, IA, MO, ND, SD, MT
PRODUCT
Natural pistachios packaged in 6 oz. bags, UPC 076056 09460. Recall # F-1267-2012
CODE
Date code 08362
RECALLING FIRM/MANUFACTURER
Fritz Company, Inc., Newport, MN, by e-mail on April 2, 2009. FDA initiated recall is complete.
REASON
The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.
VOLUME OF PRODUCT IN COMMERCE
Unknown
DISTRIBUTION
TX
;
proc transpose data=test out=want(drop=_name_ prodNo);
by prodNo;
id topic;
var content;
run;
PG
If your structure is consistently like that you should look at the examples with a BASE SAS input, no Perl needed.
Take a look at Method #2
Thanks Reeza,
but the position of the data is not fixed. Also the content varies. The only hope is looking for headers that are fixed (and upper case)
I would start with something like this:
data test;
infile datalines truncover;
length topic $32 content $200;
input topic $32.;
if not missing(topic) and anylower(trim(topic)) = 0 then do;
prodNo + topic = "PRODUCT";
input content $200.;
output;
end;
datalines;
PRODUCT
Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012
CODE
Date code 08361
RECALLING FIRM/MANUFACTURER
Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.
REASON
The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.
VOLUME OF PRODUCT IN COMMERCE
Unknown
DISTRIBUTION
MN, WI, IA, MO, ND, SD, MT
PRODUCT
Natural pistachios packaged in 6 oz. bags, UPC 076056 09460. Recall # F-1267-2012
CODE
Date code 08362
RECALLING FIRM/MANUFACTURER
Fritz Company, Inc., Newport, MN, by e-mail on April 2, 2009. FDA initiated recall is complete.
REASON
The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.
VOLUME OF PRODUCT IN COMMERCE
Unknown
DISTRIBUTION
TX
;
proc transpose data=test out=want(drop=_name_ prodNo);
by prodNo;
id topic;
var content;
run;
PG
Thanks, this sounds doable. I wasted so much time experimenting with pearl functions.
And yes, I just tested the code. Its working fine. Amazing !!!!
You have a perfect ordered data as indicated by the headers. Processing data of that is one of the most powerfull features of SAS.
PGstats is giving a nice start. What is holding you off?
PG Stats - How do I handle lines between the two records '------------------'
To skip those lines, require that topic contain at least one uppercase character. Replace
if not missing(topic) and anylower(trim(topic)) = 0 then do;
with
if anyupper(topic) > 0 and not missing(topic) and anylower(trim(topic)) = 0 then do;
PG
PG stats
I am trying to create a super heading that repeats say after 15 or 20 records
topic1 is my super heading.
somehow its not populating topic1,
data test;
infile 'ABC' truncover;
length topic1 $60 topic $60 content $1000 ;
input topic $60.;
if anyupper(topic) > 0 and anydigit(topic) = 0 and anypunct(trim(topic)) and anyspace(trim(topic)) and not missing(topic) and anylower(trim(topic)) = 0 then do;
topic1 = topic;
end;
if anyupper(topic) > 0 and anydigit(topic) = 0 and not anypunct(trim(topic)) and not anyspace(trim(topic)) and not missing(topic) and anylower(trim(topic)) = 0 then do;
prodNo + topic = "PRODUCT";
input content $1000.;
output;
end;
I guess your condition for detecting super headings is not quite right. Please show examples of super headings or describe how they should be different from headings. - PG
PG Stats here is the super heading
there are 6 types of super heading with the same format. Number of records per super heading can vary.
RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS I 
RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS II 
RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS III 
RECALLS AND FIELD CORRECTIONS: DEVICES - CLASS I 
RECALLS AND FIELD CORRECTIONS: DEVICES - CLASS II 
RECALLS AND FIELD CORRECTIONS: DEVICES - CLASS III 
here is the data format
RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS I 
PRODUCT
CODE
RECALLING FIRM/MANUFACTURER
REASON
VOLUME OF PRODUCT IN COMMERCE
DISTRIBUTION
PRODUCT
CODE
RECALLING FIRM/MANUFACTURER
REASON
VOLUME OF PRODUCT IN COMMERCE
DISTRIBUTION
RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS II
PRODUCT
CODE
RECALLING FIRM/MANUFACTURER
REASON
VOLUME OF PRODUCT IN COMMERCE
DISTRIBUTION
PRODUCT
CODE
RECALLING FIRM/MANUFACTURER
REASON
VOLUME OF PRODUCT IN COMMERCE
DISTRIBUTION
RECALLS AND FIELD CORRECTIONS: DRUGS - CLASS III
etc
I think this would read it properly
data test;
infile 'ABC' truncover;
length topic1 $60 topic $60 content $1000 ;
retain topic1;
input topic $60.;
/* Skip separation lines and out of order non-topic lines */
if anyupper(topic) > 0 and not missing(topic) and anylower(trim(topic)) = 0 then do;
/* Check for super topic lines */
if anydigit(topic) > 0 and anypunct(topic) > 0 then topic1 = topic;
else do;
/* Read content line */
prodNo + topic = "PRODUCT";
input content $1000.;
output;
end;
end;
run;
PG
buckeyefisher, changing a program is understanding the concepts. 
Pgstats did a fast one the assumptions that:
- every header is recognized by the text "Product"
every part of information is structured by a (capped) topic indicator followed by a line with the content.
Reading this first as topic/content it can be transposed to get a table layout
Changing conditions can break this approach.
There is a amazing log possible with the input statement.
- Assumption PRODUCT is the start of a fixed block/oreder of information you can do it like.
There is no check on correct order or topics in that block,
With some if-subseting and more logic that can be done
Check for (@ holding record , / next record and more)
data wantjk (drop=header) ;
infile datalines truncover;
length header $32 ;
Length PRODUCT $200
       CODE $32 
       RECALLING_FIRM_MANUFACTURER $200 
       REASON $200
       VOLUME_OF_PRODUCT_IN_COMMERCE $200
       DISTRIBUTION $32
   ;       
input  header $32. ;
if header = "PRODUCT" Then do;
    input PRODUCT $200. //
          CODE $32. //
          RECALLING_FIRM_MANUFACTURER $200. //
          REASON $200.  //
          VOLUME_OF_PRODUCT_IN_COMMERCE $200. //
          DISTRIBUTION $32.
          ;
    output;
    end;
datalines;
PRODUCT
Natural pistachios packaged in 4 oz. bags, UPC 076056 09460. Recall # F-1267-2012
CODE
Date code 08361
RECALLING FIRM/MANUFACTURER
Fritz Company, Inc., Newport, MN, by e-mail on April 1, 2009. FDA initiated recall is complete.
REASON
The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.
VOLUME OF PRODUCT IN COMMERCE
Unknown
DISTRIBUTION
MN, WI, IA, MO, ND, SD, MT
 
PRODUCT
Natural pistachios packaged in 6 oz. bags, UPC 076056 09460. Recall # F-1267-2012
CODE
Date code 08362
RECALLING FIRM/MANUFACTURER
Fritz Company, Inc., Newport, MN, by e-mail on April 2, 2009. FDA initiated recall is complete.
REASON
The product was manufactured using pistachios recalled by Setton Pistachio Company of Terra Bella because it has the potential to be contaminated with Salmonella.
VOLUME OF PRODUCT IN COMMERCE
Unknown
DISTRIBUTION
TX
;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
