Solved: how to splitting an unstructured column into multiple columns?

Ramani · Posted 04-21-2017 11:06 PM

Hi,

I am having trouble spliiting some unstructured data.

My data looks like this

ColA ColB

Fund1 [2011-07-01--2013-07-31] Richard Glass
Fund2 [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ; [2011-06-30--2015-05-15] David Canal
Fund3 [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015- 03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones

I want my data to look like this

ColA Entry1 Exit1 Manager1 Entry2 Exit2 Manager2 Entry3,exit3, Manager3 and so on

Fund1 20110701 20130731 Richard Glass

Fund2 20110630 20150515 William M. Noonan 20110630 20150515 Phili A.Noonan

Currently I am using a scan function to split using delimiters,but it is taking too long

I am trying something like this, but i feel like I need to repeat this code too much and it is not optimal.

data split;
set ms_fundmanager;
length v1-v10 $100.;
array v(10) $;
do i = 1 to dim(v);
v[i]=scan(manager_history,i,';','M');
end;
run;

data split1;
set split;
length av1-av2 $100.;
array av(2) $;
do i = 1 to dim(av);
av[i]=scan(v1,i,']','M');
end;
rename av2 = manager1;
run;

PGStats · Posted 04-21-2017 11:42 PM

This would extract the fields into a long format dataset:

data have;
infile datalines4 truncover;
input fund $ txt $500. ;
datalines4;
Fund1  [2011-07-01--2013-07-31] Richard Glass
Fund2  [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ;  [2011-06-30--2015-05-15] David Canal
Fund3  [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015-03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones
;;;;

data want;
if not prxId then prxId + prxParse("/\[(\d{4}-\d{2}-\d{2})--(\d{4}-\d{2}-\d{2})?\] ([^[;]+)/i");
set have;
start = 1;
stop = length(txt);
call prxnext(prxId, start, stop, txt, pos, len);
do No = 1 by 1 while (pos>0);
    entry = prxposn(prxId, 1, txt);
    exit = prxposn(prxId, 2, txt);
    manager = prxposn(prxId, 3, txt);
    output;
    call prxnext(prxId, start, stop, txt, pos, len);
    end;
drop prxId start stop pos len txt;
run;

proc print; run;

PG

View solution in original post

PGStats · Posted 04-21-2017 11:42 PM

This would extract the fields into a long format dataset:

data have;
infile datalines4 truncover;
input fund $ txt $500. ;
datalines4;
Fund1  [2011-07-01--2013-07-31] Richard Glass
Fund2  [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ;  [2011-06-30--2015-05-15] David Canal
Fund3  [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015-03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones
;;;;

data want;
if not prxId then prxId + prxParse("/\[(\d{4}-\d{2}-\d{2})--(\d{4}-\d{2}-\d{2})?\] ([^[;]+)/i");
set have;
start = 1;
stop = length(txt);
call prxnext(prxId, start, stop, txt, pos, len);
do No = 1 by 1 while (pos>0);
    entry = prxposn(prxId, 1, txt);
    exit = prxposn(prxId, 2, txt);
    manager = prxposn(prxId, 3, txt);
    output;
    call prxnext(prxId, start, stop, txt, pos, len);
    end;
drop prxId start stop pos len txt;
run;

proc print; run;

PG

Ramani · Posted 04-22-2017 12:08 AM

Thank you, this works wonderfully well. I did not know about those functions, I will learn about them.

how to splitting an unstructured column into multiple columns?

Re: how to splitting an unstructured column into multiple columns?

Re: how to splitting an unstructured column into multiple columns?

Re: how to splitting an unstructured column into multiple columns?

how to splitting an unstructured column into multiple columns?

Re: how to splitting an unstructured column into multiple columns?

Re: how to splitting an unstructured column into multiple columns?

Re: how to splitting an unstructured column into multiple columns?

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!