BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ramani
Fluorite | Level 6

Hi,

I am having trouble spliiting some unstructured data.

My data looks like this

ColA                                                                             ColB

Fund1                        [2011-07-01--2013-07-31] Richard Glass
Fund2                [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ;                                                                      [2011-06-30--2015-05-15] David Canal
Fund3                [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015-                              03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones

I want my data to look like this

 

ColA             Entry1          Exit1          Manager1                Entry2        Exit2        Manager2           Entry3,exit3, Manager3 and so on

Fund1          20110701     20130731  Richard Glass         

Fund2         20110630     20150515   William M. Noonan  20110630 20150515  Phili A.Noonan

 

Currently I am using a scan function to split using delimiters,but it is taking too long

I am trying something like this, but i feel like I need to repeat this code too much and it is not optimal.

data split;
set ms_fundmanager;
length v1-v10 $100.;
array v(10) $;
do i = 1 to dim(v);
v[i]=scan(manager_history,i,';','M');
end;
run;

data split1;
set split;
length av1-av2 $100.;
array av(2) $;
do i = 1 to dim(av);
av[i]=scan(v1,i,']','M');
end;
rename av2 = manager1;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This would extract the fields into a long format dataset:

 

data have;
infile datalines4 truncover;
input fund $ txt $500. ;
datalines4;
Fund1  [2011-07-01--2013-07-31] Richard Glass
Fund2  [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ;  [2011-06-30--2015-05-15] David Canal
Fund3  [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015-03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones
;;;;

data want;
if not prxId then prxId + prxParse("/\[(\d{4}-\d{2}-\d{2})--(\d{4}-\d{2}-\d{2})?\] ([^[;]+)/i");
set have;
start = 1;
stop = length(txt);
call prxnext(prxId, start, stop, txt, pos, len);
do No = 1 by 1 while (pos>0);
    entry = prxposn(prxId, 1, txt);
    exit = prxposn(prxId, 2, txt);
    manager = prxposn(prxId, 3, txt);
    output;
    call prxnext(prxId, start, stop, txt, pos, len);
    end;
drop prxId start stop pos len txt;
run;

proc print; run;
PG

View solution in original post

2 REPLIES 2
PGStats
Opal | Level 21

This would extract the fields into a long format dataset:

 

data have;
infile datalines4 truncover;
input fund $ txt $500. ;
datalines4;
Fund1  [2011-07-01--2013-07-31] Richard Glass
Fund2  [2011-06-30--2015-05-15] William M. Noonan; [2011-06-30--2015-05-15] Philip A. Noonan ;  [2011-06-30--2015-05-15] David Canal
Fund3  [2012-01-03--] Robert E. Hall; [2012-01-03--] Keith A. Lee; [2012-01-03--] Kempton M. Ingersol; [2012-01-03--2015-03-01] Amy Y. Zhang; [2013-12-31--] Damien Davis; [2014-12-31--] Andrew J. Fones
;;;;

data want;
if not prxId then prxId + prxParse("/\[(\d{4}-\d{2}-\d{2})--(\d{4}-\d{2}-\d{2})?\] ([^[;]+)/i");
set have;
start = 1;
stop = length(txt);
call prxnext(prxId, start, stop, txt, pos, len);
do No = 1 by 1 while (pos>0);
    entry = prxposn(prxId, 1, txt);
    exit = prxposn(prxId, 2, txt);
    manager = prxposn(prxId, 3, txt);
    output;
    call prxnext(prxId, start, stop, txt, pos, len);
    end;
drop prxId start stop pos len txt;
run;

proc print; run;
PG
Ramani
Fluorite | Level 6

Thank you,  this works wonderfully well. I did not know about those functions, I will learn about them.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 347 views
  • 1 like
  • 2 in conversation