Hello,
I have a variable in my dataset called 'Case_Notes' that could start with a dollar value, and I want to extract that dollar value into a new variable called 'Protected Balance' if it exists. I have been directed to perl regular expressions but am getting lost and could use some help. Here is an example of what the data looks like:
Case_Notes
$15,229.00 , 9999999999 - known fraud
" 9999999999"
$1,740.00 , 9999999999- known fraud - calling on multiple accounts
$1,155.79 , 9999999999- known fraud -male fraudster calling on female account
$14,026.81, 9999999999 - known fraud - fraudster continues to call in
$12,495.00 , 9999999999- fraudster calling on multiple accounts
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$2,565.67 , 9999999999- Two accounts opened and different addresses. Caller stated he send docs in already.
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
9999999999- known fraud - ani calling in IVR on multiple accounts |3974950358 - known fraud - ani calling in IVR on multiple accounts
9999999999 - fraudster calling in , true person lives in SC.
9999999999 - known fraud - female calling on multiple accounts
What I need is:
Protected_Balance
$15,229.00
-
$1,740.00
$1,155.79
$14,026.81
$9,936.00
$9,936.00
$5,150.00
$2,565.67
$5,150.00
$5,150.00
-
-
My latest attempt was:
data want;
set have;
format ProtectedBalance dollar10.2;
ProtectedBalance = prxchange('s/.*(\$\d+).*/\1/',-1,case_notes);
where Fraud_Status = 'FRAUD';
run;
But that got me nowhere. Any help would be appreciated.
Keep it simple:
data have;
length case_notes $200;
infile datalines truncover;
input Case_Notes $200.;
datalines;
$15,229.00 , 9999999999 - known fraud
" 9999999999"
$1,740.00 , 9999999999- known fraud - calling on multiple accounts
$1,155.79 , 9999999999- known fraud -male fraudster calling on female account
$14,026.81, 9999999999 - known fraud - fraudster continues to call in
$12,495.00 , 9999999999- fraudster calling on multiple accounts
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$2,565.67 , 9999999999- Two accounts opened and different addresses. Caller stated he send docs in already.
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
9999999999- known fraud - ani calling in IVR on multiple accounts |3974950358 - known fraud - ani calling in IVR on multiple accounts
9999999999 - fraudster calling in , true person lives in SC.
9999999999 - known fraud - female calling on multiple accounts
;
data want;
if not prxId then prxId + prxParse("/\$[0-9,.]+\d\b/");
set have;
call prxSubstr(prxId, Case_Notes, pos, len);
if pos > 0 then Protected_Balance = input(substr(Case_Notes, pos, len), ?? comma32.);
drop prxId pos len;
run;
proc print data=want noobs; run;
Try this:
data want;
format ProtectedBalance dollar10.2;
set have;
if _N_=1 then do;
retain patt1;
pattern1="/[\$][,.\d]{1,}/";
patt1=prxparse(pattern1);
end;
call prxsubstr(patt1, case_notes, position1, length1);
if position1>0 then ProtectedBalance=substr(case_notes, position1, length1);
if reverse(substr(ProtectedBalance,1,1))=',' then ProtectedBalance=substr(ProtectedBalance,1, length(ProtectedBalance)-1);
run;
Keep it simple:
data have;
length case_notes $200;
infile datalines truncover;
input Case_Notes $200.;
datalines;
$15,229.00 , 9999999999 - known fraud
" 9999999999"
$1,740.00 , 9999999999- known fraud - calling on multiple accounts
$1,155.79 , 9999999999- known fraud -male fraudster calling on female account
$14,026.81, 9999999999 - known fraud - fraudster continues to call in
$12,495.00 , 9999999999- fraudster calling on multiple accounts
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$9,936.00 , 9999999999- known fraud - young female calling as ch - true person lives in OH
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$2,565.67 , 9999999999- Two accounts opened and different addresses. Caller stated he send docs in already.
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
$5,150.00 , 9999999999- known fraud - male fraudster still calling in
9999999999- known fraud - ani calling in IVR on multiple accounts |3974950358 - known fraud - ani calling in IVR on multiple accounts
9999999999 - fraudster calling in , true person lives in SC.
9999999999 - known fraud - female calling on multiple accounts
;
data want;
if not prxId then prxId + prxParse("/\$[0-9,.]+\d\b/");
set have;
call prxSubstr(prxId, Case_Notes, pos, len);
if pos > 0 then Protected_Balance = input(substr(Case_Notes, pos, len), ?? comma32.);
drop prxId pos len;
run;
proc print data=want noobs; run;
Thank you very much for this! Worked perfectly. I appreciate the help.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.