BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Nikos
Fluorite | Level 6

Hi,

I have a variable with character values as:

VAR0


ABC_F_P_LOW

DEF_K_HIGH

ZPQ_M_X_MEDIUM

GKL_J_B

.....................

I would like to create four new variables as:

  V1     V2  V3    V4

ABC   F     P     LOW

DEF   K            HIGH

ZPQ   M     X    MEDIUM

GKL   J      B

.................................

I have tried certain Character Functions but they do not position correctly the "words" in the right new created variable (i.e. HIGH goes to V3 while it should go to V4).

I would appreciate any suggestions.

Thank you in advance

Kind regards

Nikos

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

combining Art ant Tom's methods:

data x;

input VAR0 $40.;

cards;

ABC_F_P_LOW

DEF_K__HIGH

ZPQ_M_X_MEDIUM

GKL_J_B

;

data want;

set x;

array v{4} $ ;

do _n_=1 to 4;

  v{_n_}=scan(tranwrd(var0,'__','_ _'),_n_,'_');

end;

proc print;run;

                         obs    VAR0              v1     v2    v3    v4

                         1     ABC_F_P_LOW       ABC    F     P     LOW

                         2     DEF_K__HIGH       DEF    K           HIGH

                         3     ZPQ_M_X_MEDIUM    ZPQ    M     X     MEDIUM

                         4     GKL_J_B           GKL    J     B

View solution in original post

15 REPLIES 15
Ksharp
Super User

OK.

But I think you should post some more data to clarify your question.

data x;
input VAR0 $40.;
cards;
ABC_F_P_LOW
DEF_K_HIGH
ZPQ_M_X_MEDIUM
GKL_J_B
;
run;
data want(drop=temp i);
 set x;
 array v{4} $ ;
 do i=1 to 4;
  temp=scan(var0,i,'_');
  if temp not in ('LOW' 'HIGH' 'MEDIUM' ' ' )  then v{i}=temp;
   else if not missing(temp) then v{4}=temp;
 end;
run;


Ksharp

Patrick
Opal | Level 21

If this is only about assigning the terms Low, Medium, High to the 4th variable then Ksharp provided you with a solution.

If there are also other "strings" which need to be assigned to specific variables then you would have to tell us a bit more what the rule should be to decide which "word" goes where - or you need an exhaustive list of words togethere with the place they should go (kind of a key/value pair list).

art297
Opal | Level 21

Nikos, you have to clarify your rules.  Just in your example, you have high going to V4, but B going to V3.  Why?

Astounding
PROC Star

Nikos,

As others have noted, KSharp's solution is a good one.

Here are a few situations that you may have to address ... it depends on what your data contains.

Is it possible that the third word (or even the first or second word) could also contain HIGH / MEDIUM / LOW:

ABC_F_LOW_LOW

In that case, how would you know how to parse this:

ABC_F_LOW

Should LOW definitely become V4, or is it possible it could become V3?

Do you know enough about the data to assign shorter lengths to V1-V4? 

The answers may be very easy, but you're the only one who would know.

Good luck.

Nikos
Fluorite | Level 6

Art,

You were right. The particular value sould read DEF_K__HIGH meaning that the third "word" is missing and HIGH should be placed under V4.

The rule is that each "word" is positioned under a new variable based only on its place in the initial variable

The word before the first "_" goes to VAR1, the "word" between the first "_" and the second "_" goes to the second variable V2, etc.

If there are two "_" in a row then missing goes to the respective New Variable.

Thank you in advance

Kind regards

Nikos

Haikuo
Onyx | Level 15

Nikos,

It seems to me that there is NO two '_' in a row addressed by your first post.

art297
Opal | Level 21

Nikos,

Then all you need is the "M" modifier in your scan.  e.g., borrowing from Ksharp's code:

data x;

input VAR0 $40.;

cards;

ABC_F_P_LOW

DEF_K__HIGH

ZPQ_M_X_MEDIUM

GKL_J_B

;

data want(drop=temp i);

set x;

array v{4} $ ;

do i=1 to 4;

  temp=scan(var0,i,'_',"M");

  if not missing(temp) then v{i}=temp;

end;

run;

Linlin
Lapis Lazuli | Level 10

Hi Art,

are there any difference between

data want(drop=temp i);

set x;

array v{4} $ ;

do i=1 to 4;

  temp=scan(var0,i,'_',"M");

  if not missing(temp) then v{i}=temp;

end;

run;

proc print;run;

and

data want(drop=i);

set x;

array v{4} $ ;

do i=1 to 4;

  v{i}=scan(var0,i,'_',"M");

  *if not missing(temp) then v{i}=temp;

end;

run;

proc print;run;

Thanks - Linlin

art297
Opal | Level 21

Linlin,  Yes there is!  The way you programmed it is more efficient!

Nikos
Fluorite | Level 6

You are right.

That was a mistatement of mine.

Nikos
Fluorite | Level 6

Hi all,

Unfortunately SCAN with "M" modifier works under SAS 9.2.

Since I still work with SAS 9.1.3 I would appreciate any workaround.

Thank you in advance

Kind regards

Nikos

Tom
Super User Tom
Super User

Here are two options.

1) Use TRANWRD to stick a space between the adjacent underscores.

x= tranwrd(x,'__','_ _');
word1=scan(x,1,'_');
...

2) Use the _INFILE_ trick.

data x;

  infile cards dsd dlm='_' truncover;

  input;

  do until(eof);

    set old end=eof;

    _infile_=x;

    input @1 (word1-word4) ($) @;

    output;

  end;

stop;

cards;

junk line

run;

Haikuo
Onyx | Level 15

Tom,

'@' is added to your first input statement, and it runs perfect.

data old;

input VAR0 $40.;

cards;

ABC_F_P_LOW

DEF_K__HIGH

ZPQ_M_X_MEDIUM

GKL_J_B

;

run;

data x;

  infile cards dsd dlm='_' truncover;

  input @;

  do until(eof);

    set old end=eof;

    _infile_=var0;

put 'infile=' _infile_;

    input @1 (word1-word4) ($) @;

    output;

  end;

stop;

cards;

junk line

run;

proc print;run;

Haikuo
Onyx | Level 15

One possible solution is to reverse back to your raw data, and utilize 'infile' options:

data x;

infile cards dsd dlm='_' missover;

input (var1-var4) (:$);

cards;

ABC_F_P_LOW

DEF_K__HIGH

ZPQ_M_X_MEDIUM

GKL_J_B

;

proc print;run;

Regards,

Haikuo

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 1274 views
  • 7 likes
  • 8 in conversation