BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Total_FREQ
Calcite | Level 5

Hey All!

I have a character variable of varying lengths that I want to separate into separate variables.  The variable always follows the following pattern - AA.AA.AA.AA.AA (2 characters then a period, 2 characters then a period, etc) - and each "AA" signifies a unique code I need to have separated for further analysis. 

Example of what it looks like now:

orig_var
AA.4F.9U.EB
E5.U9
GB.D2.4F.EB.AA.M2
T4
YY.4D.U9

How I'd like for it to look:

orig_varnew_var1new_var2new_var3new_var4new_var5new_var6
AA.4F.9U.EBAA4F9UEB..
E5.U9....
GB.D2.4F.EB.AA.M2GBD24FEBAAM2
T4T4.....
YY.4D.U9YY4DU9...

Any thoughts on how to code this to parse those 2-char codes?

Thanks!

Erice

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Do you know the max number of variables? if so loop using the scan function and . as your delimiter.

data want;

set have;

array parsed_vars(*) new_var1-new_var6;

i=1;

do while(scan(str, i, ".") ne "");

parsed_vars(i) =scan(str, i, ".");

i+1;

end;

run;

View solution in original post

18 REPLIES 18
Fugue
Quartz | Level 8

Specify the period as your delimiter.

Total_FREQ
Calcite | Level 5

Can you be more specific?  Also, note that orig_var is one variable in a SAS dataset that has like 200 other variables.  It's not a raw data file for me to import.

Fugue
Quartz | Level 8

That changes things then. A simple approach would be to use something like the scan function.

newvar1=scan(orig_var, 1, '.');

newvar2=scan(orig_var, 2, '.');

etc

Reeza
Super User

Do you know the max number of variables? if so loop using the scan function and . as your delimiter.

data want;

set have;

array parsed_vars(*) new_var1-new_var6;

i=1;

do while(scan(str, i, ".") ne "");

parsed_vars(i) =scan(str, i, ".");

i+1;

end;

run;

Total_FREQ
Calcite | Level 5

Reeza --

That worked except that there was a numeric vs character mismatch betweent he array variables and the orig_var pieces.  When I added _char_ at the end of the array line, it work PERFECTLY.  I was attempting some sort of array thing but just couldn't get it quite right on my own.

THANKS!!

Eric

Reeza
Super User

I think you should add the dollar sign instead of _char_. That would add in all the character variables to your array and you could potentially overwrite a variable if you had more than 6 variables in your variable that was to be parsed. In fact, it may not be a bad idea to explicitly declare the array length rather than use the * method. 

array parsed_vars(6) $  new_var1-new_var6;

ErikLund_Jensen
Rhodochrosite | Level 12

If you don't know the max. number of elements in orig_var, it can be found and stored in a macro variable by proc sql. The number can then be used to dynamically create the required number of new variables in the output data set:

proc sql noprint;

  select max(count(orig_var,'.'))+1 into :maxelements from have;

quit;

data want (drop=i); set have;

  array parsed_vars $ 2 new_var1-new_var%eval(&maxelements);

  do i = 1 to &maxelements;

  parsed_vars{i} = scan(orig_var,i,'.');

  end;

run;

Haikuo
Onyx | Level 15

Another low-maintenance dynamic option can be Proc Transpose:

data have;

     input var $40.;

     length svar $ 2;

     do i=1 by 1 until (missing(svar));

           svar=scan(var,i,'.');

           if not missing(svar) then

                output;

     end;

     cards;

AA.4F.9U.EB

E5.U9

GB.D2.4F.EB.AA.M2

T4

YY.4D.U9

;

proc sort data=have;

     by var;

run;

proc transpose data=have out=want (drop=_name_) prefix=new_var;

     by var;

     var svar;

run;

Quinn
Fluorite | Level 6

* Nice ! ;
* My minor edits:
* 1. no PRoc sort needed. ;
* 2. use ID statement ;
proc transpose data=have out=want (drop=_name_) prefix=var;
by var notsorted;
id i;
var svar;
run;

CR204
Calcite | Level 5

An alternative to Reeza's solution is to use a for-loop rather than a while (for people that don't trust while loops):

 

data want;
    set have;
    array parsed_vars(*) new_var1-new_var6;

    do i = 1 to 6;
        parsed_vars(i) =scan(str, i, ".");
    end;

run;

The 6 can be redefined as a macro variable. In my code I have a length statement before the array statement, I do not know whether it works for al data without the length statement.

Quentin
Super User

Hi @CR204, curious what you mean by not trusting WHILE loops?  They've always been trustworthy to me.  I suppose one risk they bring is that you can accidentally write an infinite loop.  But to me, they're still worthy of trust and are a useful tool.  Are you thinking of other concerns?

 

CR204
Calcite | Level 5

@Quentin While loops can be useful, but I dislike using them when a simple for loop would work. The "don't trust" part should be read with a wink. 😉

mstergia
Calcite | Level 5

Hi there,

 

I used your code which worked well in my project.  Today, however, I am getting an error (Array subscript out of range at line 29, column 1).

 

This is my code:  Not sure why it worked last week and not working today.  Any help would be appreciated!

data Parsed_Topics;

set have;

array PARSED_VARS(*)$100 Topic1-Topic12;

i=1;

do while(scan(TrainingTopic, i, ";") ne "");

 

PARSED_VARS(i) =scan(TrainingTopic, i, ";");

i+1;

end;

run;

 

 

Quinn
Fluorite | Level 6

See if string variable TrainingTopic contains more than 12 delimiters ':' - 12 is the array upbound.   Hop it helps.

L.L 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 93435 views
  • 12 likes
  • 9 in conversation