## Split h1-h10 into h1,h2,h3,h4,.... h10

Solved
Occasional Contributor
Posts: 12

# Split h1-h10 into h1,h2,h3,h4,.... h10

I want to translate the contents of a column.

For example, the content includes the following value (this is the content of 1 column in 1 row):

G1, G2, G4, X1 -X3, H51-H55, T10

I want to translate that to different rows.

The result should look like this:

G1

G2

G4

X1

X2

X3

H51

H52

H53

H54

H55

T10

I have not problems making rows of the single ones (I mean G1 G2 G4 and T10). The function scan works perfectly.

The problem is how I translate X1-X3 into X1, X2, X3 and the same for H51-H55.

Accepted Solutions
Solution
‎09-02-2013 01:21 PM
Posts: 3,167

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Yes, DN, can never go under your radar -. ANYFIRST should have a shot, while at this stage it seems to me using PRX is more straightforward.

data test;

input var \$ 80.;

length new_var \$ 20;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

if find(_var,'-') then do;

_v1=scan(_var,1,'- ');

_v2=scan(_var,2,'- ');

do _i=prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v2);

new_var=cats(prxchange('s/(\w*\D+)(\d+)/\$1/',-1,_v1),_i);

output;

end;

end;

else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103

;

Haikuo

All Replies
Posts: 3,167

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

With a combination of bunch of Char functions, we can have something like this:

data test;

input var \$ 50.;

length new_var \$ 8;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

if find(_var,'-') then

do _i=compress(scan(_var,1,'-'),,'kd') to compress(scan(_var,2,'-'),,'kd');

new_var=cats(compress(scan(_var,1,'-'),,'d'),_i);

output;

end;

else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10

;

Haikuo

Posts: 3,852

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

What if a word has an imbedded digit? NA3ME1-NA3ME1

Consider using the ANYFIRST function with negative start.

Solution
‎09-02-2013 01:21 PM
Posts: 3,167

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Yes, DN, can never go under your radar -. ANYFIRST should have a shot, while at this stage it seems to me using PRX is more straightforward.

data test;

input var \$ 80.;

length new_var \$ 20;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

if find(_var,'-') then do;

_v1=scan(_var,1,'- ');

_v2=scan(_var,2,'- ');

do _i=prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v2);

new_var=cats(prxchange('s/(\w*\D+)(\d+)/\$1/',-1,_v1),_i);

output;

end;

end;

else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103

;

Haikuo

Occasional Contributor
Posts: 12

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Thanks all. I am using the code already from hai.kuo. Only made a minor change, because the variable could also contain only numbers.

data test;

input var \$ 80.;

length new_var \$ 20;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

if find(_var,'-') then do;

_v1=scan(_var,1,'- ');

_v2=scan(_var,2,'- ');

do _i=prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v2);

IF(prxmatch("/\D+/",strip(_v1))) THEN

new_var=cats(prxchange('s/(\w*\D+)(\d+)/\$1/',-1,_v1),_i);

ELSE

new_var=_i;

output;

end;

end;

else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103, 1-10

;

The only problem I am trying to solve is when you have H01-H10. The above code would split it in the below rows

H1

H2

H3

H4

...

H10

While it should be

H01

H02

H03

H04

...

H10

Super Contributor
Posts: 339

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

That's actually fairly tricky to add within a single data step since you can't get to enforce a zn.w format. I see 2 alternatives

1: Use PRX to capture number of leading 0s from your lower boundary and then control the number of those that are output based on your _i counter (like remove one 0 from the captured lead for each increment in log10(_i)). It would resemble something like this (untested)

length lead \$20.; /*just so that prxchange output does not get converted to numeric*/

...

do _i=prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v2);

IF(prxmatch("/\D+/",strip(_v1))) THEN

lead=prxchange('s/(\w*\D+)(0*)(\d+)/\$2/',-1,_v1); /*you could change \d+ to [1-9]\d+ or something to clearly identify the purpose of 0* but since * is greedy its only a minor efficiency issue */

...

2: Another way would build a series of case based on the length of the fully captured digit as a string for the lower bound and use put& zn.w formats e.g.

do _i=prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v2);

IF(prxmatch("/\D+/",strip(_v1))) THEN

zlen = length(prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v1));

if zlen=1 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/\$1/',-1,_v1),_i);

else if zlen=2 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/\$1/',-1,_v1),put(_i, z2.);

...

else if zlen=8 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/\$1/',-1,_v1),put(_i, z8.));

PROC Star
Posts: 8,163

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

: The method I suggested will retain the leading zeros if present in the variable list.

Posts: 3,852

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

```Arthur Tabachneck wrote:

Richard van't Hoff: The method I suggested will retain the leading zeros if present in the variable list.
```

I think there will be a problem defining variables with name like (at least in an input statement).

34WE100-34WE103, 1-10

I could not get that to work without making the words name literials.  Of course it wouldn't be too hard to add some code to convert those words to name literials.

Posts: 3,852

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Maybe this.

options validvarname=any;
%let list = G1, G2, G4, X01 -X03, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103 1-10 009-005;

data _null_;

length list \$256 w \$32 newlist \$256;

list = transtrn(translate(symget('LIST'),' ',','),'-',' - ');
do i = 1 by 1;
w = scan(list,i,
' ');
if missing(w) then leave;
if w ne '-' then w = nliteral(w);
newlist = catx(' ',newlist,w);
end;

call symputx('LIST',newlist);
put _all_;

run;

data names;
input &list;
stop;

cards;
;;;;

run;

proc transpose data=names out=names2;
var _all_;
run;
proc print;

run;
PROC Star
Posts: 8,163

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

I didn't even notice the 1-10.  Hopefully, that was just a typo by the OP.  I was simply addressing the case like T01-T10.

Occasional Contributor
Posts: 12

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Thanks all.

I finally got it working for everything (even for my last problem).

My solution

data work.testje;

input var \$ 100.;

length new_var \$ 20;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

if find(_var,'-') then do;

_v1=scan(_var,1,'- ');

_v2=scan(_var,2,'- ');

do _i=prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/\$2/',-1,_v2);

_len_new=LENGTH(STRIP(_i));

_len_old=LENGTH(SUBSTR(STRIP(_v1),(LENGTH(STRIP(_v1))-ANYALPHA(STRIP(REVERSE(_v1)))) +2));

_new_i=REPEAT('0',_len_old - _len_new - 1)||STRIP(_i);

_turned=REVERSE(STRIP(_v1));

SUBSTR(_turned,1,_len_old)=REVERSE(STRIP(_new_i));

new_var=REVERSE(STRIP(_turned));

output;

end;

end;

else do;new_var=left(_var);output;end;

end;

cards;

G1, G2, G4, X1 -X3, H051-H055, T10,NA03ME1-NA03ME3, 34WE100-34WE103,001-008, 1-10, H0001- H0010

Posts: 3,852

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

These look like variable lists.  If they are you might be able to use that to your advantage.

PROC Star
Posts: 8,163

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

I agree with datanull.  If they are variable lists, you could use something like the following to convert them into a macro variable and then scan the macro variable.  e.g.:

data have;

informat contents \$80.;

input contents &;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10

;

data _null_;

set have;

contents=translate(contents,' ',',');

call symput('varlist',contents);

run;

data test;

input &varlist.;

cards;

;

run;

proc sql noprint;

select name

into :vars separated by " "

from dictionary.columns

where libname="WORK" and

memname="TEST"

;

quit;

%put &vars.;

Super User
Posts: 6,759

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Art,

Got this from Tom, who got it from DN, who got it elsewhere ...

An alternative ending to your solution maintains the order of the names, and produces output as a single column.  Instead of reading from DICTIONARY.COLUMNS, try:

proc transpose data=test (obs=0) out=want;

var &varlist;

run;

PROC Star
Posts: 8,163

## Re: Split h1-h10 into h1,h2,h3,h4,.... h10

: Yes, I was familiar with the method, but not sure of its origin either.  Tom, Randy Herbison, FriedEgg and I have a nifty use of it in a paper we're presenting at the MWSUG meeting later this month.  Take a look at: http://www.sascommunity.org/mwiki/images/c/c5/S1-14-2013.pdf

However, I usually like the SQL method more often as one can order the variables based on any variable that is contained in dictionary.columns, including name and column number.

🔒 This topic is solved and locked.

Discussion stats
• 14 replies
• 490 views
• 0 likes
• 6 in conversation