Programming the statistical procedures from SAS

Extracting a part of the string

Reply
Occasional Contributor
Posts: 7

Extracting a part of the string

Hi y'all,

I am trying to extract a number stored as a percentage (in most cases) in a string. Not all rows have the information (for example the last row in the list below), and those that have may have different positions in the string for this information. And the sign "%" is missing for some.

I would appreciate any suggestions.

Thank you in advance.

0.0028
"XXXXX U.S. FUNDING DCP, 08/01/2012   /   " 0.140000%
"Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%
0.450% Due 12/06/12
14633 - 0.520%
16438 0.65% 01/26/13
XXXXXX FINANCE LTD (Q21, Q) 0.646852%
BANK OF XXXXXXX .12% YCD 0.12%
BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP
BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52
XX&X CORP
Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%
Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode
Xxxxx Xxxxxx Capital Co.;0.501%
Xxxxx Xxxxx Xxxxxx, 0.4%
X X X X  X X X 0.375 12/28/2012
XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013
XXX XXXX XX XXXXXX XX  2012-8-17  0.0008
XXXXXXX XXXXX XXXXXX XXXX      00.2300 01/14/2013
XXX XXXXXXX CP 0.17%
XXXXXXX XXXXXXX CORP. 0.220
In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2012
XXXX BANK-NEW YORK 0.4
XXXXX Finance Corp. 0%
XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A
Xxx Xxxxxx C D 0.500 09/04/2012
REPO XXXXX XXXXX 0.31 08/22/2012
XXXXX XXXXX CORP00.190% 10/10/2012 CP
Xxxxxx Xxxx Xxxx 0.300%
Var .310% Due 01/24/13
XXXXX XXXX CORP.-0.580
XXX XXX XXX CO 98137RH29 DCP 08/02/2012
PROC Star
Posts: 7,417

Re: Extracting a part of the string

Ozzy,

Here is one way to get what I think you are trying to accomplish:

data want;

  length word $25;

  input word @@;

  number=input(word,?? 32.);

  if missing(number) then number=input(word,?? percent32.);

  if not missing(number) and number lt 1 then output;

  datalines4;

0.0028

"XXXXX U.S. FUNDING DCP, 08/01/2012   /   " 0.140000%

"Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%

0.450% Due 12/06/12

14633 - 0.520%

16438 0.65% 01/26/13

XXXXXX FINANCE LTD (Q21, Q) 0.646852%

BANK OF XXXXXXX .12% YCD 0.12%

BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP

BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52

XX&X CORP

Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%

Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode

Xxxxx Xxxxxx Capital Co.;0.501%

Xxxxx Xxxxx Xxxxxx, 0.4%

X X X X  X X X 0.375 12/28/2012

XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013

XXX XXXX XX XXXXXX XX  2012-8-17  0.0008

XXXXXXX XXXXX XXXXXX XXXX      00.2300 01/14/2013

XXX XXXXXXX CP 0.17%

XXXXXXX XXXXXXX CORP. 0.220

In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2012

XXXX BANK-NEW YORK 0.4

XXXXX Finance Corp. 0%

XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A

Xxx Xxxxxx C D 0.500 09/04/2012

REPO XXXXX XXXXX 0.31 08/22/2012

XXXXX XXXXX CORP00.190% 10/10/2012 CP

Xxxxxx Xxxx Xxxx 0.300%

Var .310% Due 01/24/13

XXXXX XXXX CORP.-0.580

XXX XXX XXX CO 98137RH29 DCP 08/02/2012

;;;;

Super User
Posts: 5,258

Re: Extracting a part of the string

Art.

Very nice approach.  There are still a few cases that might cause trouble, where the number appears but without a blank before it.  But this is well on its way.


PROC Star
Posts: 7,417

Re: Extracting a part of the string

That is easy to fix:

data want;

  length word $25;

  input word @@;

  word=compress(word,".%","kd");

  number=input(word,?? 32.);

  if missing(number) then number=input(word,?? percent32.);

  if not missing(number) and number lt 1 then output;

  datalines4;

0.0028

"XXXXX U.S. FUNDING DCP, 08/01/2012   /   " 0.140000%

"Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%

0.450% Due 12/06/12

14633 - 0.520%

16438 0.65% 01/26/13

XXXXXX FINANCE LTD (Q21, Q) 0.646852%

BANK OF XXXXXXX .12% YCD 0.12%

BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP

BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52

XX&X CORP

Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%

Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode

Xxxxx Xxxxxx Capital Co.;0.501%

Xxxxx Xxxxx Xxxxxx, 0.4%

X X X X  X X X 0.375 12/28/2012

XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013

XXX XXXX XX XXXXXX XX  2012-8-17  0.0008

XXXXXXX XXXXX XXXXXX XXXX      00.2300 01/14/2013

XXX XXXXXXX CP 0.17%

XXXXXXX XXXXXXX CORP. 0.220

In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2012

XXXX BANK-NEW YORK 0.4

XXXXX Finance Corp. 0%

XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A

Xxx Xxxxxx C D 0.500 09/04/2012

REPO XXXXX XXXXX 0.31 08/22/2012

XXXXX XXXXX CORP00.190% 10/10/2012 CP

Xxxxxx Xxxx Xxxx 0.300%

Var .310% Due 01/24/13

XXXXX XXXX CORP.-0.580

XXX XXX XXX CO 98137RH29 DCP 08/02/2012

;;;;

Regular Contributor
Posts: 241

Re: Extracting a part of the string

This simple prx approach does the right thing except for one line: "XXXXX Finance Corp. 0%"

ods _all_ close;
ods listing;
options nocenter;
 
/* test data */
data one;
  infile cards truncover;
  input line $char100.;
cards4;
0.0028
"XXXXX U.S. FUNDING DCP, 08/01/2012   /   " 0.140000%
"Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%
0.450% Due 12/06/12
14633 - 0.520%
16438 0.65% 01/26/13
XXXXXX FINANCE LTD (Q21, Q) 0.646852%
BANK OF XXXXXXX .12% YCD 0.12%
BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP
BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52
XX&X CORP
Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%
Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode
Xxxxx Xxxxxx Capital Co.;0.501%
Xxxxx Xxxxx Xxxxxx, 0.4%
X X X X  X X X 0.375 12/28/2012
XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013
XXX XXXX XX XXXXXX XX  2012-8-17  0.0008
XXXXXXX XXXXX XXXXXX XXXX      00.2300 01/14/2013
XXX XXXXXXX CP 0.17%
XXXXXXX XXXXXXX CORP. 0.220
In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2012
XXXX BANK-NEW YORK 0.4
XXXXX Finance Corp. 0%
XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A
Xxx Xxxxxx C D 0.500 09/04/2012
REPO XXXXX XXXXX 0.31 08/22/2012
XXXXX XXXXX CORP00.190% 10/10/2012 CP
Xxxxxx Xxxx Xxxx 0.300%
Var .310% Due 01/24/13
XXXXX XXXX CORP.-0.580
XXX XXX XXX CO 98137RH29 DCP 08/02/2012
U.S. TREASURY NOTES-4.327
MUNICIPAL GAS AUTH-1.967
New York City NY GO 5.000000%
;;;;
run;
 
/* extract percentages */
data two;
  set one;
  length s $20;
  s = prxchange("s/^.*?((\-{0,1}\d*\.\d+)).*$/$1/", 1, line);
  if not missing(s) then pct = input(s, ?? best.);
  put (line pct) (/=) /;
run;
/*
line=0.0028
pct=0.0028
 
line="XXXXX U.S. FUNDING DCP, 08/01/2012   /   " 0.140000%
pct=0.14
 
line="Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%
pct=0.43
 
line=0.450% Due 12/06/12
pct=0.45
 
line=14633 - 0.520%
pct=0.52
 
line=16438 0.65% 01/26/13
pct=0.65
 
line=XXXXXX FINANCE LTD (Q21, Q) 0.646852%
pct=0.646852
 
line=BANK OF XXXXXXX .12% YCD 0.12%
pct=0.12
 
line=BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP
pct=0.09
 
line=BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52
pct=0.52
 
line=XX&X CORP
pct=.
 
line=Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%
pct=0.253506
 
line=Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, ten
der 08/
pct=0.45
 
line=Xxxxx Xxxxxx Capital Co.;0.501%
pct=0.501
 
line=Xxxxx Xxxxx Xxxxxx, 0.4%
pct=0.4
 
line=X X X X  X X X 0.375 12/28/2012
pct=0.375
 
line=XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013
pct=0.42
 
line=XXX XXXX XX XXXXXX XX  2012-8-17  0.0008
pct=0.0008
 
line=XXXXXXX XXXXX XXXXXX XXXX      00.2300 01/14/2013
pct=0.23
 
line=XXX XXXXXXX CP 0.17%
pct=0.17
 
line=XXXXXXX XXXXXXX CORP. 0.220
pct=0.22
 
line=In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2
012
pct=0.16
 
line=XXXX BANK-NEW YORK 0.4
pct=0.4
 
line=XXXXX Finance Corp. 0%
pct=.
 
line=XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A
pct=0.14
 
line=Xxx Xxxxxx C D 0.500 09/04/2012
pct=0.5
 
line=REPO XXXXX XXXXX 0.31 08/22/2012
pct=0.31
 
line=XXXXX XXXXX CORP00.190% 10/10/2012 CP
pct=0.19
 
line=Xxxxxx Xxxx Xxxx 0.300%
pct=0.3
 
line=Var .310% Due 01/24/13
pct=0.31
 
line=XXXXX XXXX CORP.-0.580
pct=-0.58
 
line=XXX XXX XXX CO 98137RH29 DCP 08/02/2012
pct=.
 
line=U.S. TREASURY NOTES-4.327
pct=-4.327
 
line=MUNICIPAL GAS AUTH-1.967
pct=-1.967
 
line=New York City NY GO 5.000000%
pct=5
*/

Message was edited by: Chang Y. Chung in order to read the whole line (thanks data _null_!); to fix the prx to read the leading negative sign; and to add three more data lines with pct > 1.

Respected Advisor
Posts: 3,780

Re: Extracting a part of the string

I don't think you are reading the entire line.

data one;

infile cards truncover;

  input line $char100.;

cards4;

Occasional Contributor
Posts: 7

Re: Extracting a part of the string

Thank you very much all for the prompt and extremely useful responses.

Chang, your code works really great. I have a short follow-up question. What if several rows have values > 1; such as:

U.S. TREASURY NOTES-4.327

MUNICIPAL GAS AUTH-1.967

New York City NY GO 5.000000%

Thanks again.

Ozzy

Trusted Advisor
Posts: 1,301

Re: Extracting a part of the string

Similar to method by chang_y_chung@hotmail.com

data want;

if _n_=1 then do;

  retain _prxid;

  _prxid=prxparse('/0?\.\d+(?=[\s%])|\d+(?=%)|\d\.\d+/');

end;

infile cards dlm=' %';

input @;

call prxsubstr(_prxid,_infile_,_pos,_len);

if _pos ^=0 then

  input @_pos pct;

else

  put 'Did not find pct in obs=' _n_;

drop _:;

cards4;

0.0028

"XXXXX U.S. FUNDING DCP, 08/01/2012   /   " 0.140000%

"Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%

0.450% Due 12/06/12

14633 - 0.520%

16438 0.65% 01/26/13

XXXXXX FINANCE LTD (Q21, Q) 0.646852%

BANK OF XXXXXXX .12% YCD 0.12%

BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP

BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52

XX&X CORP

Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%

Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode

Xxxxx Xxxxxx Capital Co.;0.501%

Xxxxx Xxxxx Xxxxxx, 0.4%

X X X X  X X X 0.375 12/28/2012

XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013

XXX XXXX XX XXXXXX XX  2012-8-17  0.0008

XXXXXXX XXXXX XXXXXX XXXX      00.2300 01/14/2013

XXX XXXXXXX CP 0.17%

XXXXXXX XXXXXXX CORP. 0.220

In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2012

XXXX BANK-NEW YORK 0.4

XXXXX Finance Corp. 0%

XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A

Xxx Xxxxxx C D 0.500 09/04/2012

REPO XXXXX XXXXX 0.31 08/22/2012

XXXXX XXXXX CORP00.190% 10/10/2012 CP

Xxxxxx Xxxx Xxxx 0.300%

Var .310% Due 01/24/13

XXXXX XXXX CORP.-0.580

XXX XXX XXX CO 98137RH29 DCP 08/02/2012

U.S. TREASURY NOTES-4.327

MUNICIPAL GAS AUTH-1.967

;;;;

run;

0.0028

0.14

0.43

0.45

0.52

0.65

0.646852

0.12

0.09

0.52

.

0.253506

0.45

0.501

0.4

0.375

0.42

0.0008

0.23

0.17

0.22

0.16

0.4

0

0.14

0.5

0.31

0.19

0.3

0.31

0.58

.

4.327

1.967

Occasional Contributor
Posts: 7

Re: Extracting a part of the string


Arthur, Chang & FriedEgg,

I really appreciate your help. It works just fine.

Best,

Ozzy

Occasional Contributor
Posts: 7

Re: Extracting a part of the string

Fried,

I am trying to modify your code to allow for use in a SAS dataset stored as follows; but I coudn't get it to work yet. If you have the variables in a SAS dataset, rather in a text file, what shoud replace the infile functions?

Thanks again.

Typeidname
A1
"XXXXX U.S. FUNDING DCP, 08/01/2012   /   " 0.140000%
A2"Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%
A30.450% Due 12/06/12
B414633 - 0.520%
B516438 0.65% 01/26/13
B6XXXXXX FINANCE LTD (Q21, Q) 0.646852%
C7BANK OF XXXXXXX .12% YCD 0.12%
C8BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP
C9BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52
C10XX&X CORP
C11Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%
C12Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode
C13Xxxxx Xxxxxx Capital Co.;0.501%
C14Xxxxx Xxxxx Xxxxxx, 0.4%
C15X X X X  X X X 0.375 12/28/2012
C16XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013
C17XXX XXXX XX XXXXXX XX  2012-8-17  0.0008 
Ask a Question
Discussion stats
  • 9 replies
  • 559 views
  • 1 like
  • 6 in conversation