Hello,
I have a variable in my dataset called 'Case_Notes' that could start with a dollar value, and I want to extract that dollar value into a new variable called 'Protected Balance' if it exists. I have been directed to perl regular expressions but am getting lost and could use some help. Here is an example of what the data looks like:
Case_Notes
$15,229.00 , 9999999999 - known fraud
" 9999999999"
$1,740.00 , 9999999999- known fraud - calling on multiple accounts
$1,155.79 , 9999999999- known fraud -male fraudster calling on female account
$14,026.81, 9999999999 - known fraud - fraudster continues to call in
$12,495.00 , 9999999999- fraudster calling on multiple accounts
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$2,565.67 , 9999999999- Two accounts opened and different addresses. Caller stated he send docs in already.
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
9999999999- known fraud - ani calling in IVR on multiple accounts |3974950358 - known fraud - ani calling in IVR on multiple accounts
9999999999 - fraudster calling in , true person lives in SC.
9999999999 - known fraud - female calling on multiple accounts
What I need is:
Protected_Balance
$15,229.00
-
$1,740.00
$1,155.79
$14,026.81
$9,936.00
$9,936.00
$5,150.00
$2,565.67
$5,150.00
$5,150.00
-
-
My latest attempt was:
data want;
set have;
format ProtectedBalance dollar10.2;
ProtectedBalance = prxchange('s/.*(\$\d+).*/\1/',-1,case_notes);
where Fraud_Status = 'FRAUD';
run;
But that got me nowhere. Any help would be appreciated.
Keep it simple:
data have;
length case_notes $200;
infile datalines truncover;
input Case_Notes $200.;
datalines;
$15,229.00 , 9999999999 - known fraud
" 9999999999"
$1,740.00 , 9999999999- known fraud - calling on multiple accounts
$1,155.79 , 9999999999- known fraud -male fraudster calling on female account
$14,026.81, 9999999999 - known fraud - fraudster continues to call in
$12,495.00 , 9999999999- fraudster calling on multiple accounts
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$2,565.67 , 9999999999- Two accounts opened and different addresses. Caller stated he send docs in already.
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
9999999999- known fraud - ani calling in IVR on multiple accounts |3974950358 - known fraud - ani calling in IVR on multiple accounts
9999999999 - fraudster calling in , true person lives in SC.
9999999999 - known fraud - female calling on multiple accounts
;
data want;
if not prxId then prxId + prxParse("/\$[0-9,.]+\d\b/");
set have;
call prxSubstr(prxId, Case_Notes, pos, len);
if pos > 0 then Protected_Balance = input(substr(Case_Notes, pos, len), ?? comma32.);
drop prxId pos len;
run;
proc print data=want noobs; run;
Try this:
data want;
format ProtectedBalance dollar10.2;
set have;
if _N_=1 then do;
retain patt1;
pattern1="/[\$][,.\d]{1,}/";
patt1=prxparse(pattern1);
end;
call prxsubstr(patt1, case_notes, position1, length1);
if position1>0 then ProtectedBalance=substr(case_notes, position1, length1);
if reverse(substr(ProtectedBalance,1,1))=',' then ProtectedBalance=substr(ProtectedBalance,1, length(ProtectedBalance)-1);
run;
Keep it simple:
data have;
length case_notes $200;
infile datalines truncover;
input Case_Notes $200.;
datalines;
$15,229.00 , 9999999999 - known fraud
" 9999999999"
$1,740.00 , 9999999999- known fraud - calling on multiple accounts
$1,155.79 , 9999999999- known fraud -male fraudster calling on female account
$14,026.81, 9999999999 - known fraud - fraudster continues to call in
$12,495.00 , 9999999999- fraudster calling on multiple accounts
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$2,565.67 , 9999999999- Two accounts opened and different addresses. Caller stated he send docs in already.
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
9999999999- known fraud - ani calling in IVR on multiple accounts |3974950358 - known fraud - ani calling in IVR on multiple accounts
9999999999 - fraudster calling in , true person lives in SC.
9999999999 - known fraud - female calling on multiple accounts
;
data want;
if not prxId then prxId + prxParse("/\$[0-9,.]+\d\b/");
set have;
call prxSubstr(prxId, Case_Notes, pos, len);
if pos > 0 then Protected_Balance = input(substr(Case_Notes, pos, len), ?? comma32.);
drop prxId pos len;
run;
proc print data=want noobs; run;
Thank you very much for this! Worked perfectly. I appreciate the help.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.