Hi y'all,
I am trying to extract a number stored as a percentage (in most cases) in a string. Not all rows have the information (for example the last row in the list below), and those that have may have different positions in the string for this information. And the sign "%" is missing for some.
I would appreciate any suggestions.
Thank you in advance.
0.0028 |
"XXXXX U.S. FUNDING DCP, 08/01/2012 / " 0.140000% |
"Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000% |
0.450% Due 12/06/12 |
14633 - 0.520% |
16438 0.65% 01/26/13 |
XXXXXX FINANCE LTD (Q21, Q) 0.646852% |
BANK OF XXXXXXX .12% YCD 0.12% |
BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP |
BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52 |
XX&X CORP |
Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506% |
Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode |
Xxxxx Xxxxxx Capital Co.;0.501% |
Xxxxx Xxxxx Xxxxxx, 0.4% |
X X X X X X X 0.375 12/28/2012 |
XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013 |
XXX XXXX XX XXXXXX XX 2012-8-17 0.0008 |
XXXXXXX XXXXX XXXXXX XXXX 00.2300 01/14/2013 |
XXX XXXXXXX CP 0.17% |
XXXXXXX XXXXXXX CORP. 0.220 |
In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2012 |
XXXX BANK-NEW YORK 0.4 |
XXXXX Finance Corp. 0% |
XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A |
Xxx Xxxxxx C D 0.500 09/04/2012 |
REPO XXXXX XXXXX 0.31 08/22/2012 |
XXXXX XXXXX CORP00.190% 10/10/2012 CP |
Xxxxxx Xxxx Xxxx 0.300% |
Var .310% Due 01/24/13 |
XXXXX XXXX CORP.-0.580 |
XXX XXX XXX CO 98137RH29 DCP 08/02/2012 |
Ozzy,
Here is one way to get what I think you are trying to accomplish:
data want;
length word $25;
input word @@;
number=input(word,?? 32.);
if missing(number) then number=input(word,?? percent32.);
if not missing(number) and number lt 1 then output;
datalines4;
0.0028
"XXXXX U.S. FUNDING DCP, 08/01/2012 / " 0.140000%
"Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%
0.450% Due 12/06/12
14633 - 0.520%
16438 0.65% 01/26/13
XXXXXX FINANCE LTD (Q21, Q) 0.646852%
BANK OF XXXXXXX .12% YCD 0.12%
BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP
BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52
XX&X CORP
Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%
Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode
Xxxxx Xxxxxx Capital Co.;0.501%
Xxxxx Xxxxx Xxxxxx, 0.4%
X X X X X X X 0.375 12/28/2012
XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013
XXX XXXX XX XXXXXX XX 2012-8-17 0.0008
XXXXXXX XXXXX XXXXXX XXXX 00.2300 01/14/2013
XXX XXXXXXX CP 0.17%
XXXXXXX XXXXXXX CORP. 0.220
In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2012
XXXX BANK-NEW YORK 0.4
XXXXX Finance Corp. 0%
XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A
Xxx Xxxxxx C D 0.500 09/04/2012
REPO XXXXX XXXXX 0.31 08/22/2012
XXXXX XXXXX CORP00.190% 10/10/2012 CP
Xxxxxx Xxxx Xxxx 0.300%
Var .310% Due 01/24/13
XXXXX XXXX CORP.-0.580
XXX XXX XXX CO 98137RH29 DCP 08/02/2012
;;;;
Art.
Very nice approach. There are still a few cases that might cause trouble, where the number appears but without a blank before it. But this is well on its way.
data want;
length word $25;
input word @@;
word=compress(word,".%","kd");
number=input(word,?? 32.);
if missing(number) then number=input(word,?? percent32.);
if not missing(number) and number lt 1 then output;
datalines4;
0.0028
"XXXXX U.S. FUNDING DCP, 08/01/2012 / " 0.140000%
"Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%
0.450% Due 12/06/12
14633 - 0.520%
16438 0.65% 01/26/13
XXXXXX FINANCE LTD (Q21, Q) 0.646852%
BANK OF XXXXXXX .12% YCD 0.12%
BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP
BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52
XX&X CORP
Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%
Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode
Xxxxx Xxxxxx Capital Co.;0.501%
Xxxxx Xxxxx Xxxxxx, 0.4%
X X X X X X X 0.375 12/28/2012
XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013
XXX XXXX XX XXXXXX XX 2012-8-17 0.0008
XXXXXXX XXXXX XXXXXX XXXX 00.2300 01/14/2013
XXX XXXXXXX CP 0.17%
XXXXXXX XXXXXXX CORP. 0.220
In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2012
XXXX BANK-NEW YORK 0.4
XXXXX Finance Corp. 0%
XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A
Xxx Xxxxxx C D 0.500 09/04/2012
REPO XXXXX XXXXX 0.31 08/22/2012
XXXXX XXXXX CORP00.190% 10/10/2012 CP
Xxxxxx Xxxx Xxxx 0.300%
Var .310% Due 01/24/13
XXXXX XXXX CORP.-0.580
XXX XXX XXX CO 98137RH29 DCP 08/02/2012
;;;;
This simple prx approach does the right thing except for one line: "XXXXX Finance Corp. 0%"
ods _all_ close;
ods listing;
options nocenter;
/* test data */
data one;
infile cards truncover;
input line $char100.;
cards4;
0.0028
"XXXXX U.S. FUNDING DCP, 08/01/2012 / " 0.140000%
"Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%
0.450% Due 12/06/12
14633 - 0.520%
16438 0.65% 01/26/13
XXXXXX FINANCE LTD (Q21, Q) 0.646852%
BANK OF XXXXXXX .12% YCD 0.12%
BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP
BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52
XX&X CORP
Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%
Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode
Xxxxx Xxxxxx Capital Co.;0.501%
Xxxxx Xxxxx Xxxxxx, 0.4%
X X X X X X X 0.375 12/28/2012
XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013
XXX XXXX XX XXXXXX XX 2012-8-17 0.0008
XXXXXXX XXXXX XXXXXX XXXX 00.2300 01/14/2013
XXX XXXXXXX CP 0.17%
XXXXXXX XXXXXXX CORP. 0.220
In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2012
XXXX BANK-NEW YORK 0.4
XXXXX Finance Corp. 0%
XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A
Xxx Xxxxxx C D 0.500 09/04/2012
REPO XXXXX XXXXX 0.31 08/22/2012
XXXXX XXXXX CORP00.190% 10/10/2012 CP
Xxxxxx Xxxx Xxxx 0.300%
Var .310% Due 01/24/13
XXXXX XXXX CORP.-0.580
XXX XXX XXX CO 98137RH29 DCP 08/02/2012
U.S. TREASURY NOTES-4.327
MUNICIPAL GAS AUTH-1.967
New York City NY GO 5.000000%
;;;;
run;
/* extract percentages */
data two;
set one;
length s $20;
s = prxchange("s/^.*?((\-{0,1}\d*\.\d+)).*$/$1/", 1, line);
if not missing(s) then pct = input(s, ?? best.);
put (line pct) (/=) /;
run;
/*
line=0.0028
pct=0.0028
line="XXXXX U.S. FUNDING DCP, 08/01/2012 / " 0.140000%
pct=0.14
line="Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%
pct=0.43
line=0.450% Due 12/06/12
pct=0.45
line=14633 - 0.520%
pct=0.52
line=16438 0.65% 01/26/13
pct=0.65
line=XXXXXX FINANCE LTD (Q21, Q) 0.646852%
pct=0.646852
line=BANK OF XXXXXXX .12% YCD 0.12%
pct=0.12
line=BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP
pct=0.09
line=BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52
pct=0.52
line=XX&X CORP
pct=.
line=Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%
pct=0.253506
line=Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, ten
der 08/
pct=0.45
line=Xxxxx Xxxxxx Capital Co.;0.501%
pct=0.501
line=Xxxxx Xxxxx Xxxxxx, 0.4%
pct=0.4
line=X X X X X X X 0.375 12/28/2012
pct=0.375
line=XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013
pct=0.42
line=XXX XXXX XX XXXXXX XX 2012-8-17 0.0008
pct=0.0008
line=XXXXXXX XXXXX XXXXXX XXXX 00.2300 01/14/2013
pct=0.23
line=XXX XXXXXXX CP 0.17%
pct=0.17
line=XXXXXXX XXXXXXX CORP. 0.220
pct=0.22
line=In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2
012
pct=0.16
line=XXXX BANK-NEW YORK 0.4
pct=0.4
line=XXXXX Finance Corp. 0%
pct=.
line=XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A
pct=0.14
line=Xxx Xxxxxx C D 0.500 09/04/2012
pct=0.5
line=REPO XXXXX XXXXX 0.31 08/22/2012
pct=0.31
line=XXXXX XXXXX CORP00.190% 10/10/2012 CP
pct=0.19
line=Xxxxxx Xxxx Xxxx 0.300%
pct=0.3
line=Var .310% Due 01/24/13
pct=0.31
line=XXXXX XXXX CORP.-0.580
pct=-0.58
line=XXX XXX XXX CO 98137RH29 DCP 08/02/2012
pct=.
line=U.S. TREASURY NOTES-4.327
pct=-4.327
line=MUNICIPAL GAS AUTH-1.967
pct=-1.967
line=New York City NY GO 5.000000%
pct=5
*/
Message was edited by: Chang Y. Chung in order to read the whole line (thanks data _null_!); to fix the prx to read the leading negative sign; and to add three more data lines with pct > 1.
I don't think you are reading the entire line.
data one;
infile cards truncover;
input line $char100.;
cards4;
Thank you very much all for the prompt and extremely useful responses.
Chang, your code works really great. I have a short follow-up question. What if several rows have values > 1; such as:
U.S. TREASURY NOTES-4.327
MUNICIPAL GAS AUTH-1.967
New York City NY GO 5.000000%
Thanks again.
Ozzy
Similar to method by chang_y_chung@hotmail.com
data want;
if _n_=1 then do;
retain _prxid;
_prxid=prxparse('/0?\.\d+(?=[\s%])|\d+(?=%)|\d\.\d+/');
end;
infile cards dlm=' %';
input @;
call prxsubstr(_prxid,_infile_,_pos,_len);
if _pos ^=0 then
input @_pos pct;
else
put 'Did not find pct in obs=' _n_;
drop _:;
cards4;
0.0028
"XXXXX U.S. FUNDING DCP, 08/01/2012 / " 0.140000%
"Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000%
0.450% Due 12/06/12
14633 - 0.520%
16438 0.65% 01/26/13
XXXXXX FINANCE LTD (Q21, Q) 0.646852%
BANK OF XXXXXXX .12% YCD 0.12%
BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP
BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52
XX&X CORP
Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506%
Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode
Xxxxx Xxxxxx Capital Co.;0.501%
Xxxxx Xxxxx Xxxxxx, 0.4%
X X X X X X X 0.375 12/28/2012
XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013
XXX XXXX XX XXXXXX XX 2012-8-17 0.0008
XXXXXXX XXXXX XXXXXX XXXX 00.2300 01/14/2013
XXX XXXXXXX CP 0.17%
XXXXXXX XXXXXXX CORP. 0.220
In a joint trading account with XXXXX Xxxxx (USA), Inc. at 0.16%, dated 7/31/2012 due 08/01/2012
XXXX BANK-NEW YORK 0.4
XXXXX Finance Corp. 0%
XXXXX & XXXXXX XXXXXXXX (THE)00.140% 08/07/2012 CP 144A
Xxx Xxxxxx C D 0.500 09/04/2012
REPO XXXXX XXXXX 0.31 08/22/2012
XXXXX XXXXX CORP00.190% 10/10/2012 CP
Xxxxxx Xxxx Xxxx 0.300%
Var .310% Due 01/24/13
XXXXX XXXX CORP.-0.580
XXX XXX XXX CO 98137RH29 DCP 08/02/2012
U.S. TREASURY NOTES-4.327
MUNICIPAL GAS AUTH-1.967
;;;;
run;
0.0028
0.14
0.43
0.45
0.52
0.65
0.646852
0.12
0.09
0.52
.
0.253506
0.45
0.501
0.4
0.375
0.42
0.0008
0.23
0.17
0.22
0.16
0.4
0
0.14
0.5
0.31
0.19
0.3
0.31
0.58
.
4.327
1.967
Arthur, Chang & FriedEgg,
I really appreciate your help. It works just fine.
Best,
Ozzy
Fried,
I am trying to modify your code to allow for use in a SAS dataset stored as follows; but I coudn't get it to work yet. If you have the variables in a SAS dataset, rather in a text file, what shoud replace the infile functions?
Thanks again.
Type | id | name |
A | 1 | "XXXXX U.S. FUNDING DCP, 08/01/2012 / " 0.140000% |
A | 2 | "Xxxxxx Health & Educational Facilities Authority, NATL-RE" 0.430000% |
A | 3 | 0.450% Due 12/06/12 |
B | 4 | 14633 - 0.520% |
B | 5 | 16438 0.65% 01/26/13 |
B | 6 | XXXXXX FINANCE LTD (Q21, Q) 0.646852% |
C | 7 | BANK OF XXXXXXX .12% YCD 0.12% |
C | 8 | BANK OF XXX XXXXXX XXX CO NY00.090% 08/07/2012 CP |
C | 9 | BANK OF XXXXXX XXXX XXX LTD (NY BRANCH) 0.52 |
C | 10 | XX&X CORP |
C | 11 | Xxxxxxx Xxxxx Xxxxxxxx Ltd./LLC 0.253506% |
C | 12 | Xxxxxx Dev. Auth. Poll. Cont. Rev., 0.45% Bonds (New England Pwr. Co. Proj.) Series 1999, tender 08/15/2012, CP Mode |
C | 13 | Xxxxx Xxxxxx Capital Co.;0.501% |
C | 14 | Xxxxx Xxxxx Xxxxxx, 0.4% |
C | 15 | X X X X X X X 0.375 12/28/2012 |
C | 16 | XXXX Xxxxx Xxxxx, A1+/P1 Series, 0.420%, 1/15/2013 |
C | 17 | XXX XXXX XX XXXXXX XX 2012-8-17 0.0008 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.