Help using Base SAS procedures

Character variables with varying number of "words" and how to parse them correctly in new variables

Accepted Solution Solved
Reply
Contributor
Posts: 68
Accepted Solution

Character variables with varying number of "words" and how to parse them correctly in new variables

Hi,

I have a variable with character values as:

VAR0


ABC_F_P_LOW

DEF_K_HIGH

ZPQ_M_X_MEDIUM

GKL_J_B

.....................

I would like to create four new variables as:

  V1     V2  V3    V4

ABC   F     P     LOW

DEF   K            HIGH

ZPQ   M     X    MEDIUM

GKL   J      B

.................................

I have tried certain Character Functions but they do not position correctly the "words" in the right new created variable (i.e. HIGH goes to V3 while it should go to V4).

I would appreciate any suggestions.

Thank you in advance

Kind regards

Nikos


Accepted Solutions
Solution
‎03-12-2012 02:11 PM
Super Contributor
Posts: 1,636

Re: Character variables with varying number of "words" and how to parse them correctly in new variables

combining Art ant Tom's methods:

data x;

input VAR0 $40.;

cards;

ABC_F_P_LOW

DEF_K__HIGH

ZPQ_M_X_MEDIUM

GKL_J_B

;

data want;

set x;

array v{4} $ ;

do _n_=1 to 4;

  v{_n_}=scan(tranwrd(var0,'__','_ _'),_n_,'_');

end;

proc print;run;

                         obs    VAR0              v1     v2    v3    v4

                         1     ABC_F_P_LOW       ABC    F     P     LOW

                         2     DEF_K__HIGH       DEF    K           HIGH

                         3     ZPQ_M_X_MEDIUM    ZPQ    M     X     MEDIUM

                         4     GKL_J_B           GKL    J     B

View solution in original post


All Replies
Super User
Posts: 9,676

Character variables with varying number of "words" and how to parse them correctly in new variables

OK.

But I think you should post some more data to clarify your question.

data x;
input VAR0 $40.;
cards;
ABC_F_P_LOW
DEF_K_HIGH
ZPQ_M_X_MEDIUM
GKL_J_B
;
run;
data want(drop=temp i);
 set x;
 array v{4} $ ;
 do i=1 to 4;
  temp=scan(var0,i,'_');
  if temp not in ('LOW' 'HIGH' 'MEDIUM' ' ' )  then v{i}=temp;
   else if not missing(temp) then v{4}=temp;
 end;
run;


Ksharp

Respected Advisor
Posts: 3,889

Character variables with varying number of "words" and how to parse them correctly in new variables

If this is only about assigning the terms Low, Medium, High to the 4th variable then Ksharp provided you with a solution.

If there are also other "strings" which need to be assigned to specific variables then you would have to tell us a bit more what the rule should be to decide which "word" goes where - or you need an exhaustive list of words togethere with the place they should go (kind of a key/value pair list).

PROC Star
Posts: 7,363

Character variables with varying number of "words" and how to parse them correctly in new variables

Nikos, you have to clarify your rules.  Just in your example, you have high going to V4, but B going to V3.  Why?

Super User
Posts: 5,081

Character variables with varying number of "words" and how to parse them correctly in new variables

Nikos,

As others have noted, KSharp's solution is a good one.

Here are a few situations that you may have to address ... it depends on what your data contains.

Is it possible that the third word (or even the first or second word) could also contain HIGH / MEDIUM / LOW:

ABC_F_LOW_LOW

In that case, how would you know how to parse this:

ABC_F_LOW

Should LOW definitely become V4, or is it possible it could become V3?

Do you know enough about the data to assign shorter lengths to V1-V4? 

The answers may be very easy, but you're the only one who would know.

Good luck.

Contributor
Posts: 68

Character variables with varying number of "words" and how to parse them correctly in new variables

Art,

You were right. The particular value sould read DEF_K__HIGH meaning that the third "word" is missing and HIGH should be placed under V4.

The rule is that each "word" is positioned under a new variable based only on its place in the initial variable

The word before the first "_" goes to VAR1, the "word" between the first "_" and the second "_" goes to the second variable V2, etc.

If there are two "_" in a row then missing goes to the respective New Variable.

Thank you in advance

Kind regards

Nikos

Respected Advisor
Posts: 3,124

Character variables with varying number of "words" and how to parse them correctly in new variables

Nikos,

It seems to me that there is NO two '_' in a row addressed by your first post.

PROC Star
Posts: 7,363

Character variables with varying number of "words" and how to parse them correctly in new variables

Nikos,

Then all you need is the "M" modifier in your scan.  e.g., borrowing from Ksharp's code:

data x;

input VAR0 $40.;

cards;

ABC_F_P_LOW

DEF_K__HIGH

ZPQ_M_X_MEDIUM

GKL_J_B

;

data want(drop=temp i);

set x;

array v{4} $ ;

do i=1 to 4;

  temp=scan(var0,i,'_',"M");

  if not missing(temp) then v{i}=temp;

end;

run;

Super Contributor
Posts: 1,636

Character variables with varying number of "words" and how to parse them correctly in new variables

Hi Art,

are there any difference between

data want(drop=temp i);

set x;

array v{4} $ ;

do i=1 to 4;

  temp=scan(var0,i,'_',"M");

  if not missing(temp) then v{i}=temp;

end;

run;

proc print;run;

and

data want(drop=i);

set x;

array v{4} $ ;

do i=1 to 4;

  v{i}=scan(var0,i,'_',"M");

  *if not missing(temp) then v{i}=temp;

end;

run;

proc print;run;

Thanks - Linlin

PROC Star
Posts: 7,363

Character variables with varying number of "words" and how to parse them correctly in new variables

Linlin,  Yes there is!  The way you programmed it is more efficient!

Contributor
Posts: 68

Character variables with varying number of "words" and how to parse them correctly in new variables

You are right.

That was a mistatement of mine.

Contributor
Posts: 68

Character variables with varying number of "words" and how to parse them correctly in new variables

Hi all,

Unfortunately SCAN with "M" modifier works under SAS 9.2.

Since I still work with SAS 9.1.3 I would appreciate any workaround.

Thank you in advance

Kind regards

Nikos

Super User
Super User
Posts: 6,499

Re: Character variables with varying number of "words" and how to parse them correctly in new variables

Here are two options.

1) Use TRANWRD to stick a space between the adjacent underscores.

x= tranwrd(x,'__','_ _');
word1=scan(x,1,'_');
...

2) Use the _INFILE_ trick.

data x;

  infile cards dsd dlm='_' truncover;

  input;

  do until(eof);

    set old end=eof;

    _infile_=x;

    input @1 (word1-word4) ($) @;

    output;

  end;

stop;

cards;

junk line

run;

Respected Advisor
Posts: 3,124

Re: Character variables with varying number of "words" and how to parse them correctly in new variables

Tom,

'@' is added to your first input statement, and it runs perfect.

data old;

input VAR0 $40.;

cards;

ABC_F_P_LOW

DEF_K__HIGH

ZPQ_M_X_MEDIUM

GKL_J_B

;

run;

data x;

  infile cards dsd dlm='_' truncover;

  input @;

  do until(eof);

    set old end=eof;

    _infile_=var0;

put 'infile=' _infile_;

    input @1 (word1-word4) ($) @;

    output;

  end;

stop;

cards;

junk line

run;

proc print;run;

Respected Advisor
Posts: 3,124

Re: Character variables with varying number of "words" and how to parse them correctly in new variables

One possible solution is to reverse back to your raw data, and utilize 'infile' options:

data x;

infile cards dsd dlm='_' missover;

input (var1-var4) (:$);

cards;

ABC_F_P_LOW

DEF_K__HIGH

ZPQ_M_X_MEDIUM

GKL_J_B

;

proc print;run;

Regards,

Haikuo

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 448 views
  • 7 likes
  • 8 in conversation