parsing a character string into new variables

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

parsing a character string into new variables

Hey All!

I have a character variable of varying lengths that I want to separate into separate variables.  The variable always follows the following pattern - AA.AA.AA.AA.AA (2 characters then a period, 2 characters then a period, etc) - and each "AA" signifies a unique code I need to have separated for further analysis. 

Example of what it looks like now:

orig_var
AA.4F.9U.EB
E5.U9
GB.D2.4F.EB.AA.M2
T4
YY.4D.U9

How I'd like for it to look:

orig_varnew_var1new_var2new_var3new_var4new_var5new_var6
AA.4F.9U.EBAA4F9UEB..
E5.U9....
GB.D2.4F.EB.AA.M2GBD24FEBAAM2
T4T4.....
YY.4D.U9YY4DU9...

Any thoughts on how to code this to parse those 2-char codes?

Thanks!

Erice


Accepted Solutions
Solution
‎09-04-2013 01:27 PM
Super User
Posts: 17,784

Re: parsing a character string into new variables

Do you know the max number of variables? if so loop using the scan function and . as your delimiter.

data want;

set have;

array parsed_vars(*) new_var1-new_var6;

i=1;

do while(scan(str, i, ".") ne "");

parsed_vars(i) =scan(str, i, ".");

i+1;

end;

run;

View solution in original post


All Replies
Super Contributor
Posts: 307

Re: parsing a character string into new variables

Specify the period as your delimiter.

New Contributor
Posts: 3

Re: parsing a character string into new variables

Can you be more specific?  Also, note that orig_var is one variable in a SAS dataset that has like 200 other variables.  It's not a raw data file for me to import.

Super Contributor
Posts: 307

Re: parsing a character string into new variables

That changes things then. A simple approach would be to use something like the scan function.

newvar1=scan(orig_var, 1, '.');

newvar2=scan(orig_var, 2, '.');

etc

Solution
‎09-04-2013 01:27 PM
Super User
Posts: 17,784

Re: parsing a character string into new variables

Do you know the max number of variables? if so loop using the scan function and . as your delimiter.

data want;

set have;

array parsed_vars(*) new_var1-new_var6;

i=1;

do while(scan(str, i, ".") ne "");

parsed_vars(i) =scan(str, i, ".");

i+1;

end;

run;

New Contributor
Posts: 3

Re: parsing a character string into new variables

Reeza --

That worked except that there was a numeric vs character mismatch betweent he array variables and the orig_var pieces.  When I added _char_ at the end of the array line, it work PERFECTLY.  I was attempting some sort of array thing but just couldn't get it quite right on my own.

THANKS!!

Eric

Super User
Posts: 17,784

Re: parsing a character string into new variables

I think you should add the dollar sign instead of _char_. That would add in all the character variables to your array and you could potentially overwrite a variable if you had more than 6 variables in your variable that was to be parsed. In fact, it may not be a bad idea to explicitly declare the array length rather than use the * method. 

array parsed_vars(6) $  new_var1-new_var6;

Contributor
Posts: 30

Re: parsing a character string into new variables

If you don't know the max. number of elements in orig_var, it can be found and stored in a macro variable by proc sql. The number can then be used to dynamically create the required number of new variables in the output data set:

proc sql noprint;

  select max(count(orig_var,'.'))+1 into :maxelements from have;

quit;

data want (drop=i); set have;

  array parsed_vars $ 2 new_var1-new_var%eval(&maxelements);

  do i = 1 to &maxelements;

  parsed_vars{i} = scan(orig_var,i,'.');

  end;

run;

Respected Advisor
Posts: 3,124

Re: parsing a character string into new variables

Another low-maintenance dynamic option can be Proc Transpose:

data have;

     input var $40.;

     length svar $ 2;

     do i=1 by 1 until (missing(svar));

           svar=scan(var,i,'.');

           if not missing(svar) then

                output;

     end;

     cards;

AA.4F.9U.EB

E5.U9

GB.D2.4F.EB.AA.M2

T4

YY.4D.U9

;

proc sort data=have;

     by var;

run;

proc transpose data=have out=want (drop=_name_) prefix=new_var;

     by var;

     var svar;

run;

New Contributor
Posts: 3

Re: parsing a character string into new variables

An alternative to Reeza's solution is to use a for-loop rather than a while (for people that don't trust while loops):

 

data want;
    set have;
    array parsed_vars(*) new_var1-new_var6;

    do i = 1 to 6;
        parsed_vars(i) =scan(str, i, ".");
    end;

run;

The 6 can be redefined as a macro variable. In my code I have a length statement before the array statement, I do not know whether it works for al data without the length statement.

PROC Star
Posts: 1,230

Re: parsing a character string into new variables

Hi @CR204, curious what you mean by not trusting WHILE loops?  They've always been trustworthy to me.  I suppose one risk they bring is that you can accidentally write an infinite loop.  But to me, they're still worthy of trust and are a useful tool.  Are you thinking of other concerns?

 

New Contributor
Posts: 3

Re: parsing a character string into new variables

@Quentin While loops can be useful, but I dislike using them when a simple for loop would work. The "don't trust" part should be read with a wink. Smiley Wink

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 19715 views
  • 8 likes
  • 7 in conversation