Hey All!
I have a character variable of varying lengths that I want to separate into separate variables. The variable always follows the following pattern - AA.AA.AA.AA.AA (2 characters then a period, 2 characters then a period, etc) - and each "AA" signifies a unique code I need to have separated for further analysis.
Example of what it looks like now:
orig_var |
AA.4F.9U.EB |
E5.U9 |
GB.D2.4F.EB.AA.M2 |
T4 |
YY.4D.U9 |
How I'd like for it to look:
orig_var | new_var1 | new_var2 | new_var3 | new_var4 | new_var5 | new_var6 |
AA.4F.9U.EB | AA | 4F | 9U | EB | . | . |
E5.U9 | . | . | . | . | ||
GB.D2.4F.EB.AA.M2 | GB | D2 | 4F | EB | AA | M2 |
T4 | T4 | . | . | . | . | . |
YY.4D.U9 | YY | 4D | U9 | . | . | . |
Any thoughts on how to code this to parse those 2-char codes?
Thanks!
Erice
Do you know the max number of variables? if so loop using the scan function and . as your delimiter.
data want;
set have;
array parsed_vars(*) new_var1-new_var6;
i=1;
do while(scan(str, i, ".") ne "");
parsed_vars(i) =scan(str, i, ".");
i+1;
end;
run;
Specify the period as your delimiter.
Can you be more specific? Also, note that orig_var is one variable in a SAS dataset that has like 200 other variables. It's not a raw data file for me to import.
That changes things then. A simple approach would be to use something like the scan function.
newvar1=scan(orig_var, 1, '.');
newvar2=scan(orig_var, 2, '.');
etc
Do you know the max number of variables? if so loop using the scan function and . as your delimiter.
data want;
set have;
array parsed_vars(*) new_var1-new_var6;
i=1;
do while(scan(str, i, ".") ne "");
parsed_vars(i) =scan(str, i, ".");
i+1;
end;
run;
Reeza --
That worked except that there was a numeric vs character mismatch betweent he array variables and the orig_var pieces. When I added _char_ at the end of the array line, it work PERFECTLY. I was attempting some sort of array thing but just couldn't get it quite right on my own.
THANKS!!
Eric
I think you should add the dollar sign instead of _char_. That would add in all the character variables to your array and you could potentially overwrite a variable if you had more than 6 variables in your variable that was to be parsed. In fact, it may not be a bad idea to explicitly declare the array length rather than use the * method.
array parsed_vars(6) $ new_var1-new_var6;
If you don't know the max. number of elements in orig_var, it can be found and stored in a macro variable by proc sql. The number can then be used to dynamically create the required number of new variables in the output data set:
proc sql noprint;
select max(count(orig_var,'.'))+1 into :maxelements from have;
quit;
data want (drop=i); set have;
array parsed_vars $ 2 new_var1-new_var%eval(&maxelements);
do i = 1 to &maxelements;
parsed_vars{i} = scan(orig_var,i,'.');
end;
run;
Another low-maintenance dynamic option can be Proc Transpose:
data have;
input var $40.;
length svar $ 2;
do i=1 by 1 until (missing(svar));
svar=scan(var,i,'.');
if not missing(svar) then
output;
end;
cards;
AA.4F.9U.EB
E5.U9
GB.D2.4F.EB.AA.M2
T4
YY.4D.U9
;
proc sort data=have;
by var;
run;
proc transpose data=have out=want (drop=_name_) prefix=new_var;
by var;
var svar;
run;
* Nice ! ;
* My minor edits:
* 1. no PRoc sort needed. ;
* 2. use ID statement ;
proc transpose data=have out=want (drop=_name_) prefix=var;
by var notsorted;
id i;
var svar;
run;
An alternative to Reeza's solution is to use a for-loop rather than a while (for people that don't trust while loops):
data want;
set have;
array parsed_vars(*) new_var1-new_var6;
do i = 1 to 6;
parsed_vars(i) =scan(str, i, ".");
end;
run;
The 6 can be redefined as a macro variable. In my code I have a length statement before the array statement, I do not know whether it works for al data without the length statement.
Hi @CR204, curious what you mean by not trusting WHILE loops? They've always been trustworthy to me. I suppose one risk they bring is that you can accidentally write an infinite loop. But to me, they're still worthy of trust and are a useful tool. Are you thinking of other concerns?
@Quentin While loops can be useful, but I dislike using them when a simple for loop would work. The "don't trust" part should be read with a wink. 😉
Hi there,
I used your code which worked well in my project. Today, however, I am getting an error (Array subscript out of range at line 29, column 1).
This is my code: Not sure why it worked last week and not working today. Any help would be appreciated!
data Parsed_Topics;
set have;
array PARSED_VARS(*)$100 Topic1-Topic12;
i=1;
do while(scan(TrainingTopic, i, ";") ne "");
PARSED_VARS(i) =scan(TrainingTopic, i, ";");
i+1;
end;
run;
See if string variable TrainingTopic contains more than 12 delimiters ':' - 12 is the array upbound. Hop it helps.
L.L
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.