DATA Step, Macro, Functions and more

parsing characters with multiple delimters

Accepted Solution Solved
Reply
Contributor
Posts: 28
Accepted Solution

parsing characters with multiple delimters

[ Edited ]

 

 

 Hello SAS users

 

 I want to split a character variable into multiple (maximum 25) character variables based on comma ,  hyphen -   slash /   colon : semicolon ;  and space 

 

 

for example

 

var_namevar1var2var3var4var5var6var7var8var9
vanguard, fund, is-good;and not bad;class AvanguardfundisgoodandnotbadclassA
 vang-uard, fund  vanguardfund      
bad/ thi:ngs happen: class A     badthingshappenclassA   

 

 

 

like this.

 

Since I got over 1 million rows for var_name so I just want to do it efficiently.

Anybody can give some help?

 

It will be more than welcome.

 

 

 

 

 


Accepted Solutions
Solution
‎05-14-2018 08:34 AM
PROC Star
Posts: 1,215

Re: parsing characters with multiple delimters

[ Edited ]

Would definitely make this data set long. If that is not an option, do something like this

 

data have;
length var_name $200;
var_name="vanguard, fund, is-good;and not bad;class A";output;
var_name=" vang-uard, fund";output;
var_name="bad/ thi:ngs happen: class A";output;
run;

proc sql;
select max(countw(var_name," ,-/;:"))
into :maxnum
from have;
quit;

%put &maxnum.;

data want;
set have;
delims = ',-/:; ';
numWords = countw(var_name, delims);
array var{&maxnum.} $;
do i=1 to dim(var);
var[i] = scan(var_name, i, delims);
end;
drop delims numWords i;
run;

View solution in original post


All Replies
Super User
Super User
Posts: 9,427

Re: parsing characters with multiple delimters

Well, first off some advice, its not a great idea to have variables going across, there could be hundreds and your code then needs to know how many of them etc.  Makes life far more difficult.  What I would do is output at each delimiter to a new obs, then transpose if needed:

 

data want;
  set have
  do i=1 to countw(var_name," ,-;:");
    word=scan(var_name,i," ,-;:");
    output;
  end;
run;

This will give you a long table with word containing the right amount of observations (i.e. if you transpose then there will be many blank variables as not all will have max number of words).

 

 

Another way is to find the max number of words up front and then create an array from that, maybe:

 

proc sql;
  select count(max(countw(var_name," ,-/;:"))) 
  into :marry
  from have;
quit;

data want;
  set have;
  array max_elem{&marry.} $200.;
  do i=1 to countw(var_name," ,-/;:");
max_elem{i}=scan(var_name,i," ,-/;:");
end;
run;

 

Solution
‎05-14-2018 08:34 AM
PROC Star
Posts: 1,215

Re: parsing characters with multiple delimters

[ Edited ]

Would definitely make this data set long. If that is not an option, do something like this

 

data have;
length var_name $200;
var_name="vanguard, fund, is-good;and not bad;class A";output;
var_name=" vang-uard, fund";output;
var_name="bad/ thi:ngs happen: class A";output;
run;

proc sql;
select max(countw(var_name," ,-/;:"))
into :maxnum
from have;
quit;

%put &maxnum.;

data want;
set have;
delims = ',-/:; ';
numWords = countw(var_name, delims);
array var{&maxnum.} $;
do i=1 to dim(var);
var[i] = scan(var_name, i, delims);
end;
drop delims numWords i;
run;
Contributor
Posts: 28

Re: parsing characters with multiple delimters

Perfect really thank you!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 156 views
  • 2 likes
  • 3 in conversation