- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I have a dataset with long text fields. I need to replace every instance of a dollar amount, say, "439.33" with "439.33USD"
Here's a sample data set:
data one;
infile cards ;
length name $200;
input name;
cards;
Ccy="USD">738.56</NS!figlwijfoaiu345Ccy="USD">38.44</ScT@
run;
So what I have is:
What I want is USD to be present after every dollar amount as is here:
I found two different community forums, and know that I will need a combination of
- index
- To find the decimal point that follows any "USD" in the text
- substr
- tranwrd functions
- To replace that dollar amount with [dollaramount]USD
There could be multiple instances of Ccy="USD">[dollar amount] in the string. I just want to tack on the USD after the dollar amount, but not every dollar amount is XYZ.AB, some are YZ.AB, some are Z.AB etc. There will always be a decimal point with two decimal places though that follow the Ccy="USD"
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Is it always USD? Or are there other currencies?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This replaces:
at least a digit followed by a dot followed by 2 digits
with:
the group found followed by USD.
data TWO;
set ONE;
NAME2=prxchange('s/(\d+\.\d\d)/\1USD/',-1,NAME);
run;
{edited to avoid losing one character}
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The syntax is that of Perl regular expressions.
Tons of tutorials on the web, that's how I learnt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The solution presented by @ChrisNZ goes part of the way, but I would add a bit to make sure that only the right numbers get selected:
name=prxchange('s/("USD">\d+\.\d\d)(?=<)/\1USD/',-1,name);
This looks for the string "USD"> followed by at least one digit, a period and two digits (that's the stuff in the first paranthesis). The whole thing should be followed by "<", but that string is not included in the match (the second paranthesis, "?=" denotes a look-ahead buffer), the match is replaced with the same plus the text "USD".