how to splitting an unstructured column into multiple columns?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

how to splitting an unstructured column into multiple columns?

Hi,

I am having trouble spliiting some unstructured data.

My data looks like this

ColA                                                                             ColB

Fund1                        [2011-07-01--2013-07-31] Richard Glass
Fund2                [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ;                                                                      [2011-06-30--2015-05-15] David Canal
Fund3                [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015-                              03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones

I want my data to look like this

 

ColA             Entry1          Exit1          Manager1                Entry2        Exit2        Manager2           Entry3,exit3, Manager3 and so on

Fund1          20110701     20130731  Richard Glass         

Fund2         20110630     20150515   William M. Noonan  20110630 20150515  Phili A.Noonan

 

Currently I am using a scan function to split using delimiters,but it is taking too long

I am trying something like this, but i feel like I need to repeat this code too much and it is not optimal.

data split;
set ms_fundmanager;
length v1-v10 $100.;
array v(10) $;
do i = 1 to dim(v);
v[i]=scan(manager_history,i,';','M');
end;
run;

data split1;
set split;
length av1-av2 $100.;
array av(2) $;
do i = 1 to dim(av);
av[i]=scan(v1,i,']','M');
end;
rename av2 = manager1;
run;


Accepted Solutions
Solution
‎04-22-2017 12:08 AM
Respected Advisor
Posts: 4,609

Re: how to splitting an unstructured column into multiple columns?

This would extract the fields into a long format dataset:

 

data have;
infile datalines4 truncover;
input fund $ txt $500. ;
datalines4;
Fund1  [2011-07-01--2013-07-31] Richard Glass
Fund2  [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ;  [2011-06-30--2015-05-15] David Canal
Fund3  [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015-03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones
;;;;

data want;
if not prxId then prxId + prxParse("/\[(\d{4}-\d{2}-\d{2})--(\d{4}-\d{2}-\d{2})?\] ([^[;]+)/i");
set have;
start = 1;
stop = length(txt);
call prxnext(prxId, start, stop, txt, pos, len);
do No = 1 by 1 while (pos>0);
    entry = prxposn(prxId, 1, txt);
    exit = prxposn(prxId, 2, txt);
    manager = prxposn(prxId, 3, txt);
    output;
    call prxnext(prxId, start, stop, txt, pos, len);
    end;
drop prxId start stop pos len txt;
run;

proc print; run;
PG

View solution in original post


All Replies
Solution
‎04-22-2017 12:08 AM
Respected Advisor
Posts: 4,609

Re: how to splitting an unstructured column into multiple columns?

This would extract the fields into a long format dataset:

 

data have;
infile datalines4 truncover;
input fund $ txt $500. ;
datalines4;
Fund1  [2011-07-01--2013-07-31] Richard Glass
Fund2  [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ;  [2011-06-30--2015-05-15] David Canal
Fund3  [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015-03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones
;;;;

data want;
if not prxId then prxId + prxParse("/\[(\d{4}-\d{2}-\d{2})--(\d{4}-\d{2}-\d{2})?\] ([^[;]+)/i");
set have;
start = 1;
stop = length(txt);
call prxnext(prxId, start, stop, txt, pos, len);
do No = 1 by 1 while (pos>0);
    entry = prxposn(prxId, 1, txt);
    exit = prxposn(prxId, 2, txt);
    manager = prxposn(prxId, 3, txt);
    output;
    call prxnext(prxId, start, stop, txt, pos, len);
    end;
drop prxId start stop pos len txt;
run;

proc print; run;
PG
Occasional Contributor
Posts: 6

Re: how to splitting an unstructured column into multiple columns?

Thank you,  this works wonderfully well. I did not know about those functions, I will learn about them.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 108 views
  • 1 like
  • 2 in conversation