Solved
New Contributor
Posts: 2

# How to substring a dataset variable

Hello,

I would like to substring a column (variablex) in a dataset  by every two characters.

Essentially turning this...

dataset f1

variablex

Row1     0236571621

Row2     23916303457 686941

Row3     133411

Row4     0101292244

into this....

dataset f2

var1     var2     var3     var4     var5     var6     var7     var8     var9

02       36        57       16       21

23       91        63       03       45        7        68       69        41

13       34        11

01       01        29       22       44

The length of variablex can go up to 500 and varies for each row so creating individual substring values (see below) is not practical.

var1 = substr(variablex,1,2);

var2 = substr(variablex,3,2);

var3 = substr(variablex,5,2);

var4 = substr(variablex,7,2);

The number of rows in dataset f1 also varies and can be up to 150k rows long.

Is there any way to do this in SAS?

Thanks

Accepted Solutions
Solution
‎09-02-2014 10:08 AM
Super User
Posts: 10,778

## Re: How to substring a dataset variable

data have;

input variablex &\$100.;

cards;

0236571621

23916303457 686941

133411

0101292244

;

run;

proc sql noprint;

select max(length(variablex)) into : n separated by ' ' from have;

quit;

%put &n ;

data want(drop=i n);

set have;

array x{*} \$ var1-var%eval(&n/2) ;

n=0;

do i=1 to &n by 2;

n+1;x{n}=substr(variablex,i,2);

end;

run;

Xia Keshan

All Replies
Super User
Posts: 5,878

## Re: How to substring a dataset variable

Try Array and do loop logic,

Data never sleeps
New Contributor
Posts: 2

## Re: How to substring a dataset variable

Hi LinusH,

Thanks for your reply. Are you able to provide any more details as to the Array?

Thanks

Posts: 1,147

## Re: How to substring a dataset variable

data have;

input variablex &\$100.;

cards;

0236571621

23916303457 686941

133411

0101292244

;

data want;

set have;

array new(9) var1-var9;

array val(9) _temporary_ (1 3 5 7 9 11 13 15 17);

do i = 1 to 9;

new(i)=substr(variablex,val(i),2);

end;

run;

Thanks,

Jag

Thanks,
Jag
Solution
‎09-02-2014 10:08 AM
Super User
Posts: 10,778

## Re: How to substring a dataset variable

data have;

input variablex &\$100.;

cards;

0236571621

23916303457 686941

133411

0101292244

;

run;

proc sql noprint;

select max(length(variablex)) into : n separated by ' ' from have;

quit;

%put &n ;

data want(drop=i n);

set have;

array x{*} \$ var1-var%eval(&n/2) ;

n=0;

do i=1 to &n by 2;

n+1;x{n}=substr(variablex,i,2);

end;

run;

Xia Keshan

Super User
Posts: 6,765

## Re: How to substring a dataset variable

A small tweak to a good solution:  SUBSTR defines new variables as longer than you might think.  Shorten the new variables when defining them on the ARRAY statement:

array x{*} \$ 2 var1-var%eval(&n/2) ;

Posts: 3,852

## Re: How to substring a dataset variable

I wonder if there is any advantage,other than not having to fiddle with the array, to using INFILE magic.  You still need to measure to create the proper number of variables.

data str;
input str \$char50.;

cards;
0236571621
23916303457 686941
133411
0101292244
989x34
;;;;
run;
filename FT15F001 temp;
data by2;
infile FT15F001;
input @;
do while(not eof);
set str end=eof;
_infile_ = str;

input @1 (var1-var10)(\$char2.) @;
output;

end;

stop;

parmcards;
Necessary evil
;;;;

run;
proc print;

run;
Super Contributor
Posts: 319

## Re: How to substring a dataset variable

Hello,

One solution:

data have;
infile datalines truncover;
input variablex \$500.;
datalines;
0236571621
23916303457 6869418
133411
0101292244
;

*determine the real number of variables;
proc sql noprint;
select
case mod(max(length(strip(variablex))),2)
when 0 then max(length(strip(variablex))) / 2
else (max(length(strip(variablex)))+1 ) / 2
end into :maxlength from have;
quit;

%put &maxlength;

data want;
set have;

array var{&maxlength} \$ 2;

do i=1 to round(length(strip(variablex))/2) ;
var{i}=substr(variablex,2*(i-1)+1,2);
end;

drop i;
run;

Super User
Posts: 9,599

## Re: How to substring a dataset variable

Not really a need to check how many possibilities or setup arrays, output each block of 2 as row then transpose up:

data have;

length varx \$200.;

varx="0236571621"; output;

varx="23916303457 686941"; output;

varx="133411"; output;

varx="0101292244"; output;

run;

proc sort data=have;

by varx;

run;

data inter;

set have;

do i=1 to 200 by 2;

text=substr(varx,i,2);

if strip(text) ne "" then output;

end;

run;

proc transpose data=inter out=want prefix=var;

by varx;

var text;

run;

Super User
Posts: 13,525

## Re: How to substring a dataset variable

Maybe slightly better: do I = 1 to length(varx) by 2;

🔒 This topic is solved and locked.