Hi,
I am having trouble spliiting some unstructured data.
My data looks like this
ColA ColB
Fund1 [2011-07-01--2013-07-31] Richard Glass
Fund2 [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ; [2011-06-30--2015-05-15] David Canal
Fund3 [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015- 03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones
I want my data to look like this
ColA Entry1 Exit1 Manager1 Entry2 Exit2 Manager2 Entry3,exit3, Manager3 and so on
Fund1 20110701 20130731 Richard Glass
Fund2 20110630 20150515 William M. Noonan 20110630 20150515 Phili A.Noonan
Currently I am using a scan function to split using delimiters,but it is taking too long
I am trying something like this, but i feel like I need to repeat this code too much and it is not optimal.
data split;
set ms_fundmanager;
length v1-v10 $100.;
array v(10) $;
do i = 1 to dim(v);
v[i]=scan(manager_history,i,';','M');
end;
run;
data split1;
set split;
length av1-av2 $100.;
array av(2) $;
do i = 1 to dim(av);
av[i]=scan(v1,i,']','M');
end;
rename av2 = manager1;
run;
This would extract the fields into a long format dataset:
data have;
infile datalines4 truncover;
input fund $ txt $500. ;
datalines4;
Fund1 [2011-07-01--2013-07-31] Richard Glass
Fund2 [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ; [2011-06-30--2015-05-15] David Canal
Fund3 [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015-03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones
;;;;
data want;
if not prxId then prxId + prxParse("/\[(\d{4}-\d{2}-\d{2})--(\d{4}-\d{2}-\d{2})?\] ([^[;]+)/i");
set have;
start = 1;
stop = length(txt);
call prxnext(prxId, start, stop, txt, pos, len);
do No = 1 by 1 while (pos>0);
entry = prxposn(prxId, 1, txt);
exit = prxposn(prxId, 2, txt);
manager = prxposn(prxId, 3, txt);
output;
call prxnext(prxId, start, stop, txt, pos, len);
end;
drop prxId start stop pos len txt;
run;
proc print; run;
This would extract the fields into a long format dataset:
data have;
infile datalines4 truncover;
input fund $ txt $500. ;
datalines4;
Fund1 [2011-07-01--2013-07-31] Richard Glass
Fund2 [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ; [2011-06-30--2015-05-15] David Canal
Fund3 [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015-03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones
;;;;
data want;
if not prxId then prxId + prxParse("/\[(\d{4}-\d{2}-\d{2})--(\d{4}-\d{2}-\d{2})?\] ([^[;]+)/i");
set have;
start = 1;
stop = length(txt);
call prxnext(prxId, start, stop, txt, pos, len);
do No = 1 by 1 while (pos>0);
entry = prxposn(prxId, 1, txt);
exit = prxposn(prxId, 2, txt);
manager = prxposn(prxId, 3, txt);
output;
call prxnext(prxId, start, stop, txt, pos, len);
end;
drop prxId start stop pos len txt;
run;
proc print; run;
Thank you, this works wonderfully well. I did not know about those functions, I will learn about them.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.