Pyrite | Level 9

## creating D02-D05 from D02;D03;D04;D05 and more complicated things

Hello,

i have a variable separated by ;

Examples :

D01;D02;D03;D04;D05;                                           should become D01-D05

D01;D02;D03;D05;                                                  should become D01-D03;D05

D10;D11;D12;D13;D17;D20;D21;D22                        should become D10-D13;D17;D20-D22;

Any help to get me on my way?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Onyx | Level 15

## Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Here is one version that can be used for more general conditions, such as undefined length of substrings, undefined number of character or digits within the substrings.

data have;

input var \$ 60.;

datalines4;

D01;D02;D03;D04;D05;AA101;AA102;AA103;

D01;D02;D03;D05;E01;F01;F02;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_:);

set have;

length _var VarFinal \$100.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

if _i=1 then _sv=scan(var,_i,';');

_mv1=scan(var,_i,';');

_mc1=compress(_mv1,,'ka');

_mn1=compress(_mv1,,'kd');

_mv2=scan(var,_i+1,';');

_mc2=compress(_mv2,,'ka');

_mn2=compress(_mv2,,'kd');

if (_mc1 ne _mc2 or _mn2-_mn1>1) then

do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

10 REPLIES 10
Onyx | Level 15

## Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Following code can also applied to more general condition, such as: D01;E01;

Limitation: Can only work on 3-character substring like this: D01; while it can be tweaked to accomodate more complex scenarios.

data have;

input var \$40.;

datalines4;

D01;D02;D03;D04;D05;

D01;D02;D03;D05;E01;F01;F02;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_:);

set have;

length _var VarFinal \$40.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

if _i=1 then _sv=scan(var,_i,';');

_mv1=scan(var,_i,';');

_mv2=scan(var,_i+1,';');

if (first(_mv1) ne first(_mv2) or substr(_mv2,2,2)-substr(_mv1,2,2)>1) then

do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

Regards,

Haikuo

Onyx | Level 15

## Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Here is one version that can be used for more general conditions, such as undefined length of substrings, undefined number of character or digits within the substrings.

data have;

input var \$ 60.;

datalines4;

D01;D02;D03;D04;D05;AA101;AA102;AA103;

D01;D02;D03;D05;E01;F01;F02;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_:);

set have;

length _var VarFinal \$100.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

if _i=1 then _sv=scan(var,_i,';');

_mv1=scan(var,_i,';');

_mc1=compress(_mv1,,'ka');

_mn1=compress(_mv1,,'kd');

_mv2=scan(var,_i+1,';');

_mc2=compress(_mv2,,'ka');

_mn2=compress(_mv2,,'kd');

if (_mc1 ne _mc2 or _mn2-_mn1>1) then

do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

Onyx | Level 15

## Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Going further, here is another more general approach which works on unsorted substrings with the same variable:

data have;

input var \$ 60.;

datalines4;

AA103;D01;D05;D03;D04;D02;AA101;AA102;

F01;D02;D03;D05;E01;D01;F02;

D1000;D11;D22;D13;D17;D20;D21;D12;

;;;;

proc sql;

select max(countw(var)) into :num from have;quit;

filename new_v "h:\temp\new_v.txt";

data _null_ ;

set have;

file new_v;

array v \$10 _v1-_v%sysevalf(&num+1);

do _i=1 by 1 until (missing (scan(var,_i,';')));

v(_i)=scan(var,_i,';');

end;

call sortc(of v(*));

put v(*);

run;

data want (drop=_:) ;

infile new_v;

length _var VarFinal \$100.;

input var : & \$300.;

do _i=1 by 1 until (missing (scan(var,_i)));

if _i=1 then _sv=scan(var,_i);

_mv1=scan(var,_i);

_mc1=compress(_mv1,,'ka');

_mn1=compress(_mv1,,'kd');

_mv2=scan(var,_i+1);

_mc2=compress(_mv2,,'ka');

_mn2=compress(_mv2,,'kd');

if (_mc1 ne _mc2 or abs(_mn2-_mn1)>1 ) then

do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

Pyrite | Level 9

## Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

I have to calculate a second variable now:

I have the same strings:

D01;D02;D03;D04;D05;                                           should become DDDDD

D01;D02;D03;D05;                                                  should become DDD-D

D10;D11;D12;D13;D17;D20;D21;D22                        should become ---------DDDD---D--DDD;

Onyx | Level 15

## Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Question:

The '-' in between groups, is it always just one? or quantiy of '-'s reflects the size of the gap?

Pyrite | Level 9

## Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

the - reflects the size of the gap, so for each number that is not in the row there is a -

Onyx | Level 15

## Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

I see you already have an answer. Anyway, if pursuing the same logic that I have applied on the previous ones, here is one solution:

This only applied to the same character scenario, as I don't know your rules about transits between , say D02 to FF03, if there are such things in real life.

data have;

input var \$40.;

datalines4;

D01;D02;D03;D04;D05;

D01;D02;D03;D05;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_:);

set have;

length  VarFinal \$40.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

if _i=1 then

do;

_sv=scan(var,_i,';');

_svc=substr(_sv,1,1);

VarFinal=_svc;

end;

_mv1=scan(var,_i,';');

_mv2=scan(var,_i+1,';');

_m21=substr(_mv2,2,2)-substr(_mv1,2,2);

if _m21=1 then Varfinal=cats(Varfinal,_svc);

else if _m21>1 then

do;

do _k=1 by 1 to (_m21-1);

VarFinal=cats(VarFinal,'-');

end;VarFinal=cats(VarFinal,_svc);

end;

end;

run;

proc print;run;

Pyrite | Level 9

## Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

This one is actually easier than the original problem.

data have;

input var \$ 60.;

datalines4;

D01;D02;D03;D04;D05;

D01;D02;D03;D05;

A10;A11;A12;A13;A17;A20;A21;A22;

;;;;

data want (drop=_:);

set have;

length newvar \$100 _subvar \$5 _gapchar \$100;

do _i = 1 by 1 until ( missing( _subvar ) );

_subvar = scan( var, _i, ';' );

_prefix = compress( _subvar,, 'ka' );

_suffix = compress( _subvar,, 'kd' );

_lagsuffix = lag( _suffix );

_gapchar = '';

* check for gap;

_gap = _suffix - coalesce(_lagsuffix,0) - 1;

if _gap > 0 then _gapchar = repeat( '-', _gap-1 );

newvar = cats( newvar, _gapchar, _prefix );

end;

run;

Pyrite | Level 9

## Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Here is another variation using the LAG function:

data want (drop=_:);

set have;

length newvar \$100 _subvar \$5;

_isincrement = 0;

do _i = 1 by 1 until ( missing( _subvar ) );

_subvar = scan( var, _i, ';' );

_prefix = compress( _subvar,, 'ka' );

_lagprefix = lag( _prefix );

_suffix = compress( _subvar,, 'kd' );

_lagsuffix = lag( _suffix );

_lagsubvar = lag( _subvar );

if _prefix = _lagprefix and _suffix = _lagsuffix + 1 then do;

_isincrement = 1;

end;

else do;

if _isincrement then newvar = catx( '-', newvar, _lagsubvar );

newvar = catx( ';', newvar, _subvar );

_isincrement = 0;

end;

end;

run;

Super User

## Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

I prefer to Linlin's solution.

```data have;
input var \$40.;
datalines4;
D01;D02;D03;D04;D05;
D01;D02;D03;D05;
D10;D11;D12;D13;D17;D20;D21;D22;
;;;;
run;
data temp1(keep=group d num);
set have;
group+1;
i=1;
d=substr(scan(var,i,';'),1,1);
num=input(substr(scan(var,i,';'),2),best8.);
do while(not missing(d));
output;
i+1;
d=substr(scan(var,i,';'),1,1);
num=input(substr(scan(var,i,';'),2),best8.);
end;
run;
proc sort data=temp1;by group num;run;
data temp2(keep=group num);
set temp1(rename=(num=_num));
by group;
if last.group then do;
do num=1 to _num;
output;
end;
end;
run;
data x;
merge temp1 temp2;
by group num;
if missing(d) then d='-';
run;
data want(keep=have want);
set x;
by group;
length have want \$ 400;
retain have want;
if first.group then call missing(want,have);
if d ne '-' then have=catx(';',have,cats(d,num));
want=cats(want,d);
if last.group then output;
run;

```

Ksharp

Discussion stats
• 10 replies
• 1652 views
• 0 likes
• 4 in conversation