Hi,
I have a variable with character values as:
VAR0
ABC_F_P_LOW
DEF_K_HIGH
ZPQ_M_X_MEDIUM
GKL_J_B
.....................
I would like to create four new variables as:
V1 V2 V3 V4
ABC F P LOW
DEF K HIGH
ZPQ M X MEDIUM
GKL J B
.................................
I have tried certain Character Functions but they do not position correctly the "words" in the right new created variable (i.e. HIGH goes to V3 while it should go to V4).
I would appreciate any suggestions.
Thank you in advance
Kind regards
Nikos
combining Art ant Tom's methods:
data x;
input VAR0 $40.;
cards;
ABC_F_P_LOW
DEF_K__HIGH
ZPQ_M_X_MEDIUM
GKL_J_B
;
data want;
set x;
array v{4} $ ;
do _n_=1 to 4;
v{_n_}=scan(tranwrd(var0,'__','_ _'),_n_,'_');
end;
proc print;run;
obs VAR0 v1 v2 v3 v4
1 ABC_F_P_LOW ABC F P LOW
2 DEF_K__HIGH DEF K HIGH
3 ZPQ_M_X_MEDIUM ZPQ M X MEDIUM
4 GKL_J_B GKL J B
OK.
But I think you should post some more data to clarify your question.
data x; input VAR0 $40.; cards; ABC_F_P_LOW DEF_K_HIGH ZPQ_M_X_MEDIUM GKL_J_B ; run; data want(drop=temp i); set x; array v{4} $ ; do i=1 to 4; temp=scan(var0,i,'_'); if temp not in ('LOW' 'HIGH' 'MEDIUM' ' ' ) then v{i}=temp; else if not missing(temp) then v{4}=temp; end; run;
Ksharp
If this is only about assigning the terms Low, Medium, High to the 4th variable then Ksharp provided you with a solution.
If there are also other "strings" which need to be assigned to specific variables then you would have to tell us a bit more what the rule should be to decide which "word" goes where - or you need an exhaustive list of words togethere with the place they should go (kind of a key/value pair list).
Nikos, you have to clarify your rules. Just in your example, you have high going to V4, but B going to V3. Why?
Nikos,
As others have noted, KSharp's solution is a good one.
Here are a few situations that you may have to address ... it depends on what your data contains.
Is it possible that the third word (or even the first or second word) could also contain HIGH / MEDIUM / LOW:
ABC_F_LOW_LOW
In that case, how would you know how to parse this:
ABC_F_LOW
Should LOW definitely become V4, or is it possible it could become V3?
Do you know enough about the data to assign shorter lengths to V1-V4?
The answers may be very easy, but you're the only one who would know.
Good luck.
Art,
You were right. The particular value sould read DEF_K__HIGH meaning that the third "word" is missing and HIGH should be placed under V4.
The rule is that each "word" is positioned under a new variable based only on its place in the initial variable
The word before the first "_" goes to VAR1, the "word" between the first "_" and the second "_" goes to the second variable V2, etc.
If there are two "_" in a row then missing goes to the respective New Variable.
Thank you in advance
Kind regards
Nikos
Nikos,
It seems to me that there is NO two '_' in a row addressed by your first post.
Nikos,
Then all you need is the "M" modifier in your scan. e.g., borrowing from Ksharp's code:
data x;
input VAR0 $40.;
cards;
ABC_F_P_LOW
DEF_K__HIGH
ZPQ_M_X_MEDIUM
GKL_J_B
;
data want(drop=temp i);
set x;
array v{4} $ ;
do i=1 to 4;
temp=scan(var0,i,'_',"M");
if not missing(temp) then v{i}=temp;
end;
run;
Hi Art,
are there any difference between
data want(drop=temp i);
set x;
array v{4} $ ;
do i=1 to 4;
temp=scan(var0,i,'_',"M");
if not missing(temp) then v{i}=temp;
end;
run;
proc print;run;
and
data want(drop=i);
set x;
array v{4} $ ;
do i=1 to 4;
v{i}=scan(var0,i,'_',"M");
*if not missing(temp) then v{i}=temp;
end;
run;
proc print;run;
Thanks - Linlin
Linlin, Yes there is! The way you programmed it is more efficient!
You are right.
That was a mistatement of mine.
Hi all,
Unfortunately SCAN with "M" modifier works under SAS 9.2.
Since I still work with SAS 9.1.3 I would appreciate any workaround.
Thank you in advance
Kind regards
Nikos
Here are two options.
1) Use TRANWRD to stick a space between the adjacent underscores.
x= tranwrd(x,'__','_ _'); word1=scan(x,1,'_'); ...
2) Use the _INFILE_ trick.
data x;
infile cards dsd dlm='_' truncover;
input;
do until(eof);
set old end=eof;
_infile_=x;
input @1 (word1-word4) ($) @;
output;
end;
stop;
cards;
junk line
run;
Tom,
'@' is added to your first input statement, and it runs perfect.
data old;
input VAR0 $40.;
cards;
ABC_F_P_LOW
DEF_K__HIGH
ZPQ_M_X_MEDIUM
GKL_J_B
;
run;
data x;
infile cards dsd dlm='_' truncover;
input @;
do until(eof);
set old end=eof;
_infile_=var0;
put 'infile=' _infile_;
input @1 (word1-word4) ($) @;
output;
end;
stop;
cards;
junk line
run;
proc print;run;
One possible solution is to reverse back to your raw data, and utilize 'infile' options:
data x;
infile cards dsd dlm='_' missover;
input (var1-var4) (:$);
cards;
ABC_F_P_LOW
DEF_K__HIGH
ZPQ_M_X_MEDIUM
GKL_J_B
;
proc print;run;
Regards,
Haikuo
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.