BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Filipvdr
Pyrite | Level 9

Hello,

i have a variable separated by ;

Examples :

D01;D02;D03;D04;D05;                                           should become D01-D05

D01;D02;D03;D05;                                                  should become D01-D03;D05

D10;D11;D12;D13;D17;D20;D21;D22                        should become D10-D13;D17;D20-D22;

Any help to get me on my way?

Thanks! 

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Here is one version that can be used for more general conditions, such as undefined length of substrings, undefined number of character or digits within the substrings.

data have;

input var $ 60.;

datalines4;

D01;D02;D03;D04;D05;AA101;AA102;AA103;

D01;D02;D03;D05;E01;F01;F02;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_:);

set have;

length _var VarFinal $100.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

  if _i=1 then _sv=scan(var,_i,';');

  _mv1=scan(var,_i,';');

  _mc1=compress(_mv1,,'ka');

  _mn1=compress(_mv1,,'kd');

  _mv2=scan(var,_i+1,';');

  _mc2=compress(_mv2,,'ka');

  _mn2=compress(_mv2,,'kd');

   if (_mc1 ne _mc2 or _mn2-_mn1>1) then

          do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

View solution in original post

10 REPLIES 10
Haikuo
Onyx | Level 15

Following code can also applied to more general condition, such as: D01;E01;

Limitation: Can only work on 3-character substring like this: D01; while it can be tweaked to accomodate more complex scenarios.

data have;

input var $40.;

datalines4;

D01;D02;D03;D04;D05;

D01;D02;D03;D05;E01;F01;F02;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_:);

set have;

length _var VarFinal $40.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

  if _i=1 then _sv=scan(var,_i,';');

  _mv1=scan(var,_i,';');

  _mv2=scan(var,_i+1,';');

   if (first(_mv1) ne first(_mv2) or substr(_mv2,2,2)-substr(_mv1,2,2)>1) then

          do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

Regards,

Haikuo

Haikuo
Onyx | Level 15

Here is one version that can be used for more general conditions, such as undefined length of substrings, undefined number of character or digits within the substrings.

data have;

input var $ 60.;

datalines4;

D01;D02;D03;D04;D05;AA101;AA102;AA103;

D01;D02;D03;D05;E01;F01;F02;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_:);

set have;

length _var VarFinal $100.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

  if _i=1 then _sv=scan(var,_i,';');

  _mv1=scan(var,_i,';');

  _mc1=compress(_mv1,,'ka');

  _mn1=compress(_mv1,,'kd');

  _mv2=scan(var,_i+1,';');

  _mc2=compress(_mv2,,'ka');

  _mn2=compress(_mv2,,'kd');

   if (_mc1 ne _mc2 or _mn2-_mn1>1) then

          do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

Haikuo
Onyx | Level 15

Going further, here is another more general approach which works on unsorted substrings with the same variable:

data have;

input var $ 60.;

datalines4;

AA103;D01;D05;D03;D04;D02;AA101;AA102;

F01;D02;D03;D05;E01;D01;F02;

D1000;D11;D22;D13;D17;D20;D21;D12;

;;;;

proc sql;

select max(countw(var)) into :num from have;quit;

filename new_v "h:\temp\new_v.txt";

data _null_ ;

set have;

file new_v;

array v $10 _v1-_v%sysevalf(&num+1);

do _i=1 by 1 until (missing (scan(var,_i,';')));

v(_i)=scan(var,_i,';');

end;

call sortc(of v(*));

put v(*);

run;

data want (drop=_:) ;

infile new_v;

length _var VarFinal $100.;

input var : & $300.;

do _i=1 by 1 until (missing (scan(var,_i)));

  if _i=1 then _sv=scan(var,_i);

  _mv1=scan(var,_i);

  _mc1=compress(_mv1,,'ka');

  _mn1=compress(_mv1,,'kd');

  _mv2=scan(var,_i+1);

  _mc2=compress(_mv2,,'ka');

  _mn2=compress(_mv2,,'kd');

   if (_mc1 ne _mc2 or abs(_mn2-_mn1)>1 ) then

          do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

Filipvdr
Pyrite | Level 9

thanks for all your answers

I have to calculate a second variable now:

I have the same strings:

D01;D02;D03;D04;D05;                                           should become DDDDD

D01;D02;D03;D05;                                                  should become DDD-D

D10;D11;D12;D13;D17;D20;D21;D22                        should become ---------DDDD---D--DDD;

Haikuo
Onyx | Level 15

Question:

The '-' in between groups, is it always just one? or quantiy of '-'s reflects the size of the gap?

Filipvdr
Pyrite | Level 9

the - reflects the size of the gap, so for each number that is not in the row there is a -

Haikuo
Onyx | Level 15

I see you already have an answer. Anyway, if pursuing the same logic that I have applied on the previous ones, here is one solution:

This only applied to the same character scenario, as I don't know your rules about transits between , say D02 to FF03, if there are such things in real life.

data have;

input var $40.;

datalines4;

D01;D02;D03;D04;D05;

D01;D02;D03;D05;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_:);

set have;

length  VarFinal $40.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

  if _i=1 then

      do;

           _sv=scan(var,_i,';');

           _svc=substr(_sv,1,1);

VarFinal=_svc;

end;

  _mv1=scan(var,_i,';');

  _mv2=scan(var,_i+1,';');

  _m21=substr(_mv2,2,2)-substr(_mv1,2,2);

  if _m21=1 then Varfinal=cats(Varfinal,_svc);

  else if _m21>1 then

        do;

do _k=1 by 1 to (_m21-1);

           VarFinal=cats(VarFinal,'-');

          end;VarFinal=cats(VarFinal,_svc);

end;

end;

run;

proc print;run;

FloydNevseta
Pyrite | Level 9

This one is actually easier than the original problem.

data have;

input var $ 60.;

datalines4;

D01;D02;D03;D04;D05;

D01;D02;D03;D05;

A10;A11;A12;A13;A17;A20;A21;A22;

;;;;

data want (drop=_:);

set have;

length newvar $100 _subvar $5 _gapchar $100;

do _i = 1 by 1 until ( missing( _subvar ) );

   _subvar = scan( var, _i, ';' );

   _prefix = compress( _subvar,, 'ka' );

   _suffix = compress( _subvar,, 'kd' );

   _lagsuffix = lag( _suffix );

   _gapchar = '';

   * check for gap;

   _gap = _suffix - coalesce(_lagsuffix,0) - 1;

   if _gap > 0 then _gapchar = repeat( '-', _gap-1 );

   newvar = cats( newvar, _gapchar, _prefix );

end;

run;

FloydNevseta
Pyrite | Level 9

Here is another variation using the LAG function:

data want (drop=_:);

set have;

length newvar $100 _subvar $5;

_isincrement = 0;

do _i = 1 by 1 until ( missing( _subvar ) );

   _subvar = scan( var, _i, ';' );

   _prefix = compress( _subvar,, 'ka' );

   _lagprefix = lag( _prefix );

   _suffix = compress( _subvar,, 'kd' );

   _lagsuffix = lag( _suffix );

   _lagsubvar = lag( _subvar );

   if _prefix = _lagprefix and _suffix = _lagsuffix + 1 then do;

      _isincrement = 1;

   end;

   else do;

      if _isincrement then newvar = catx( '-', newvar, _lagsubvar );

      newvar = catx( ';', newvar, _subvar );

      _isincrement = 0;

   end;

end;

run;

Ksharp
Super User

I prefer to Linlin's solution.

data have;
input var $40.;
datalines4;
D01;D02;D03;D04;D05;
D01;D02;D03;D05;
D10;D11;D12;D13;D17;D20;D21;D22;
;;;;
run;
data temp1(keep=group d num);
 set have;
 group+1;
  i=1;
  d=substr(scan(var,i,';'),1,1);
  num=input(substr(scan(var,i,';'),2),best8.);
  do while(not missing(d));
   output;
   i+1;
   d=substr(scan(var,i,';'),1,1);
   num=input(substr(scan(var,i,';'),2),best8.);
  end;
run;
proc sort data=temp1;by group num;run;
data temp2(keep=group num);
 set temp1(rename=(num=_num));
 by group;
 if last.group then do;
                     do num=1 to _num;
                           output;
                          end;
                         end;
run;
data x;
 merge temp1 temp2;
 by group num; 
 if missing(d) then d='-';
run;
data want(keep=have want);
 set x;
 by group;
 length have want $ 400;
 retain have want;
 if first.group then call missing(want,have);
 if d ne '-' then have=catx(';',have,cats(d,num));
 want=cats(want,d);
 if last.group then output;
run;


Ksharp

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 3236 views
  • 0 likes
  • 4 in conversation