Hello SAS users
I want to split a character variable into multiple (maximum 25) character variables based on comma , hyphen - slash / colon : semicolon ; and space
for example
var_name | var1 | var2 | var3 | var4 | var5 | var6 | var7 | var8 | var9 |
vanguard, fund, is-good;and not bad;class A | vanguard | fund | is | good | and | not | bad | class | A |
vang-uard, fund | vang | uard | fund | ||||||
bad/ thi:ngs happen: class A | bad | thi | ngs | happen | class | A |
like this.
Since I got over 1 million rows for var_name so I just want to do it efficiently.
Anybody can give some help?
It will be more than welcome.
Would definitely make this data set long. If that is not an option, do something like this
data have;
length var_name $200;
var_name="vanguard, fund, is-good;and not bad;class A";output;
var_name=" vang-uard, fund";output;
var_name="bad/ thi:ngs happen: class A";output;
run;
proc sql;
select max(countw(var_name," ,-/;:"))
into :maxnum
from have;
quit;
%put &maxnum.;
data want;
set have;
delims = ',-/:; ';
numWords = countw(var_name, delims);
array var{&maxnum.} $;
do i=1 to dim(var);
var[i] = scan(var_name, i, delims);
end;
drop delims numWords i;
run;
Well, first off some advice, its not a great idea to have variables going across, there could be hundreds and your code then needs to know how many of them etc. Makes life far more difficult. What I would do is output at each delimiter to a new obs, then transpose if needed:
data want; set have do i=1 to countw(var_name," ,-;:"); word=scan(var_name,i," ,-;:"); output; end; run;
This will give you a long table with word containing the right amount of observations (i.e. if you transpose then there will be many blank variables as not all will have max number of words).
Another way is to find the max number of words up front and then create an array from that, maybe:
proc sql; select count(max(countw(var_name," ,-/;:"))) into :marry from have; quit; data want; set have; array max_elem{&marry.} $200.; do i=1 to countw(var_name," ,-/;:");
max_elem{i}=scan(var_name,i," ,-/;:");
end;
run;
Would definitely make this data set long. If that is not an option, do something like this
data have;
length var_name $200;
var_name="vanguard, fund, is-good;and not bad;class A";output;
var_name=" vang-uard, fund";output;
var_name="bad/ thi:ngs happen: class A";output;
run;
proc sql;
select max(countw(var_name," ,-/;:"))
into :maxnum
from have;
quit;
%put &maxnum.;
data want;
set have;
delims = ',-/:; ';
numWords = countw(var_name, delims);
array var{&maxnum.} $;
do i=1 to dim(var);
var[i] = scan(var_name, i, delims);
end;
drop delims numWords i;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.