I want to translate the contents of a column.
For example, the content includes the following value (this is the content of 1 column in 1 row):
G1, G2, G4, X1 -X3, H51-H55, T10
I want to translate that to different rows.
The result should look like this:
G1
G2
G4
X1
X2
X3
H51
H52
H53
H54
H55
T10
I have not problems making rows of the single ones (I mean G1 G2 G4 and T10). The function scan works perfectly.
The problem is how I translate X1-X3 into X1, X2, X3 and the same for H51-H55.
Yes, DN, can never go under your radar -:). ANYFIRST should have a shot, while at this stage it seems to me using PRX is more straightforward.
data test;
input var $ 80.;
length new_var $ 20;
do _j=1 to countw(var,',');
_var=scan(var,_j,',');
if find(_var,'-') then do;
_v1=scan(_var,1,'- ');
_v2=scan(_var,2,'- ');
do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1) to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);
new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);
output;
end;
end;
else do;new_var=left(_var);output;end;
end;
drop _:;
cards;
G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103
;
Haikuo
With a combination of bunch of Char functions, we can have something like this:
data test;
input var $ 50.;
length new_var $ 8;
do _j=1 to countw(var,',');
_var=scan(var,_j,',');
if find(_var,'-') then
do _i=compress(scan(_var,1,'-'),,'kd') to compress(scan(_var,2,'-'),,'kd');
new_var=cats(compress(scan(_var,1,'-'),,'d'),_i);
output;
end;
else do;new_var=left(_var);output;end;
end;
drop _:;
cards;
G1, G2, G4, X1 -X3, H51-H55, T10
;
Haikuo
What if a word has an imbedded digit? NA3ME1-NA3ME1
Consider using the ANYFIRST function with negative start.
Yes, DN, can never go under your radar -:). ANYFIRST should have a shot, while at this stage it seems to me using PRX is more straightforward.
data test;
input var $ 80.;
length new_var $ 20;
do _j=1 to countw(var,',');
_var=scan(var,_j,',');
if find(_var,'-') then do;
_v1=scan(_var,1,'- ');
_v2=scan(_var,2,'- ');
do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1) to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);
new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);
output;
end;
end;
else do;new_var=left(_var);output;end;
end;
drop _:;
cards;
G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103
;
Haikuo
Thanks all. I am using the code already from hai.kuo. Only made a minor change, because the variable could also contain only numbers.
data test;
input var $ 80.;
length new_var $ 20;
do _j=1 to countw(var,',');
_var=scan(var,_j,',');
if find(_var,'-') then do;
_v1=scan(_var,1,'- ');
_v2=scan(_var,2,'- ');
do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1) to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);
IF(prxmatch("/\D+/",strip(_v1))) THEN
new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);
ELSE
new_var=_i;
output;
end;
end;
else do;new_var=left(_var);output;end;
end;
drop _:;
cards;
G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103, 1-10
;
The only problem I am trying to solve is when you have H01-H10. The above code would split it in the below rows
H1
H2
H3
H4
...
H10
While it should be
H01
H02
H03
H04
...
H10
That's actually fairly tricky to add within a single data step since you can't get to enforce a zn.w format. I see 2 alternatives
1: Use PRX to capture number of leading 0s from your lower boundary and then control the number of those that are output based on your _i counter (like remove one 0 from the captured lead for each increment in log10(_i)). It would resemble something like this (untested)
length lead $20.; /*just so that prxchange output does not get converted to numeric*/
...
do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1) to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);
IF(prxmatch("/\D+/",strip(_v1))) THEN
lead=prxchange('s/(\w*\D+)(0*)(\d+)/$2/',-1,_v1); /*you could change \d+ to [1-9]\d+ or something to clearly identify the purpose of 0* but since * is greedy its only a minor efficiency issue */
new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1), length), substrn(left(lead), 1, length(trim(left(lead)))-(floor(log10(_i))-floor(log10(prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)))) ) ,_i);
...
drop lead;
2: Another way would build a series of case based on the length of the fully captured digit as a string for the lower bound and use put& zn.w formats e.g.
do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1) to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);
IF(prxmatch("/\D+/",strip(_v1))) THEN
zlen = length(prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1));
if zlen=1 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);
else if zlen=2 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),put(_i, z2.);
...
else if zlen=8 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),put(_i, z8.));
Arthur Tabachneck wrote:
Richard van't Hoff: The method I suggested will retain the leading zeros if present in the variable list.
I think there will be a problem defining variables with name like (at least in an input statement).
34WE100-34WE103, 1-10
I could not get that to work without making the words name literials. Of course it wouldn't be too hard to add some code to convert those words to name literials.
Maybe this.
Thanks all.
I finally got it working for everything (even for my last problem).
My solution
data work.testje;
input var $ 100.;
length new_var $ 20;
do _j=1 to countw(var,',');
_var=scan(var,_j,',');
if find(_var,'-') then do;
_v1=scan(_var,1,'- ');
_v2=scan(_var,2,'- ');
do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1) to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);
_len_new=LENGTH(STRIP(_i));
_len_old=LENGTH(SUBSTR(STRIP(_v1),(LENGTH(STRIP(_v1))-ANYALPHA(STRIP(REVERSE(_v1)))) +2));
_new_i=REPEAT('0',_len_old - _len_new - 1)||STRIP(_i);
_turned=REVERSE(STRIP(_v1));
SUBSTR(_turned,1,_len_old)=REVERSE(STRIP(_new_i));
new_var=REVERSE(STRIP(_turned));
output;
end;
end;
else do;new_var=left(_var);output;end;
end;
cards;
G1, G2, G4, X1 -X3, H051-H055, T10,NA03ME1-NA03ME3, 34WE100-34WE103,001-008, 1-10, H0001- H0010
These look like variable lists. If they are you might be able to use that to your advantage.
I agree with datanull. If they are variable lists, you could use something like the following to convert them into a macro variable and then scan the macro variable. e.g.:
data have;
informat contents $80.;
input contents &;
cards;
G1, G2, G4, X1 -X3, H51-H55, T10
;
data _null_;
set have;
contents=translate(contents,' ',',');
call symput('varlist',contents);
run;
data test;
input &varlist.;
cards;
;
run;
proc sql noprint;
select name
into :vars separated by " "
from dictionary.columns
where libname="WORK" and
memname="TEST"
;
quit;
%put &vars.;
Art,
Got this from Tom, who got it from DN, who got it elsewhere ...
An alternative ending to your solution maintains the order of the names, and produces output as a single column. Instead of reading from DICTIONARY.COLUMNS, try:
proc transpose data=test (obs=0) out=want;
var &varlist;
run;
: Yes, I was familiar with the method, but not sure of its origin either. Tom, Randy Herbison, FriedEgg and I have a nifty use of it in a paper we're presenting at the MWSUG meeting later this month. Take a look at: http://www.sascommunity.org/mwiki/images/c/c5/S1-14-2013.pdf
However, I usually like the SQL method more often as one can order the variables based on any variable that is contained in dictionary.columns, including name and column number.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.