DATA Step, Macro, Functions and more

creating D02-D05 from D02;D03;D04;D05 and more complicated things

Accepted Solution Solved
Reply
Regular Contributor
Posts: 229
Accepted Solution

creating D02-D05 from D02;D03;D04;D05 and more complicated things

Hello,

i have a variable separated by ;

Examples :

D01;D02;D03;D04;D05;                                           should become D01-D05

D01;D02;D03;D05;                                                  should become D01-D03;D05

D10;D11;D12;D13;D17;D20;D21;D22                        should become D10-D13;D17;D20-D22;

Any help to get me on my way?

Thanks! 


Accepted Solutions
Solution
‎03-22-2012 01:05 PM
Respected Advisor
Posts: 3,124

Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Here is one version that can be used for more general conditions, such as undefined length of substrings, undefined number of character or digits within the substrings.

data have;

input var $ 60.;

datalines4;

D01;D02;D03;D04;D05;AA101;AA102;AA103;

D01;D02;D03;D05;E01;F01;F02;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_Smiley Happy;

set have;

length _var VarFinal $100.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

  if _i=1 then _sv=scan(var,_i,';');

  _mv1=scan(var,_i,';');

  _mc1=compress(_mv1,,'ka');

  _mn1=compress(_mv1,,'kd');

  _mv2=scan(var,_i+1,';');

  _mc2=compress(_mv2,,'ka');

  _mn2=compress(_mv2,,'kd');

   if (_mc1 ne _mc2 or _mn2-_mn1>1) then

          do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Following code can also applied to more general condition, such as: D01;E01;

Limitation: Can only work on 3-character substring like this: D01; while it can be tweaked to accomodate more complex scenarios.

data have;

input var $40.;

datalines4;

D01;D02;D03;D04;D05;

D01;D02;D03;D05;E01;F01;F02;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_Smiley Happy;

set have;

length _var VarFinal $40.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

  if _i=1 then _sv=scan(var,_i,';');

  _mv1=scan(var,_i,';');

  _mv2=scan(var,_i+1,';');

   if (first(_mv1) ne first(_mv2) or substr(_mv2,2,2)-substr(_mv1,2,2)>1) then

          do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

Regards,

Haikuo

Solution
‎03-22-2012 01:05 PM
Respected Advisor
Posts: 3,124

Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Here is one version that can be used for more general conditions, such as undefined length of substrings, undefined number of character or digits within the substrings.

data have;

input var $ 60.;

datalines4;

D01;D02;D03;D04;D05;AA101;AA102;AA103;

D01;D02;D03;D05;E01;F01;F02;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_Smiley Happy;

set have;

length _var VarFinal $100.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

  if _i=1 then _sv=scan(var,_i,';');

  _mv1=scan(var,_i,';');

  _mc1=compress(_mv1,,'ka');

  _mn1=compress(_mv1,,'kd');

  _mv2=scan(var,_i+1,';');

  _mc2=compress(_mv2,,'ka');

  _mn2=compress(_mv2,,'kd');

   if (_mc1 ne _mc2 or _mn2-_mn1>1) then

          do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

Respected Advisor
Posts: 3,124

Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Going further, here is another more general approach which works on unsorted substrings with the same variable:

data have;

input var $ 60.;

datalines4;

AA103;D01;D05;D03;D04;D02;AA101;AA102;

F01;D02;D03;D05;E01;D01;F02;

D1000;D11;D22;D13;D17;D20;D21;D12;

;;;;

proc sql;

select max(countw(var)) into :num from have;quit;

filename new_v "h:\temp\new_v.txt";

data _null_ ;

set have;

file new_v;

array v $10 _v1-_v%sysevalf(&num+1);

do _i=1 by 1 until (missing (scan(var,_i,';')));

v(_i)=scan(var,_i,';');

end;

call sortc(of v(*));

put v(*);

run;

data want (drop=_Smiley Happy ;

infile new_v;

length _var VarFinal $100.;

input var : & $300.;

do _i=1 by 1 until (missing (scan(var,_i)));

  if _i=1 then _sv=scan(var,_i);

  _mv1=scan(var,_i);

  _mc1=compress(_mv1,,'ka');

  _mn1=compress(_mv1,,'kd');

  _mv2=scan(var,_i+1);

  _mc2=compress(_mv2,,'ka');

  _mn2=compress(_mv2,,'kd');

   if (_mc1 ne _mc2 or abs(_mn2-_mn1)>1 ) then

          do;

if _sv = _mv1 then _var=_sv;

else _var=catx('-',_sv,_mv1);

_sv=_mv2;

end;

if not missing(_var) then varFinal=catx(';',varFinal,_var);

call missing(_var);

end;

run;

proc print;run;

Regular Contributor
Posts: 229

Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

thanks for all your answers

I have to calculate a second variable now:

I have the same strings:

D01;D02;D03;D04;D05;                                           should become DDDDD

D01;D02;D03;D05;                                                  should become DDD-D

D10;D11;D12;D13;D17;D20;D21;D22                        should become ---------DDDD---D--DDD;

Respected Advisor
Posts: 3,124

Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Question:

The '-' in between groups, is it always just one? or quantiy of '-'s reflects the size of the gap?

Regular Contributor
Posts: 229

Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

the - reflects the size of the gap, so for each number that is not in the row there is a -

Respected Advisor
Posts: 3,124

Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

I see you already have an answer. Anyway, if pursuing the same logic that I have applied on the previous ones, here is one solution:

This only applied to the same character scenario, as I don't know your rules about transits between , say D02 to FF03, if there are such things in real life.

data have;

input var $40.;

datalines4;

D01;D02;D03;D04;D05;

D01;D02;D03;D05;

D10;D11;D12;D13;D17;D20;D21;D22;

;;;;

data want (drop=_Smiley Happy;

set have;

length  VarFinal $40.;

do _i=1 by 1 until (missing (scan(var,_i,';')));

  if _i=1 then

      do;

           _sv=scan(var,_i,';');

           _svc=substr(_sv,1,1);

VarFinal=_svc;

end;

  _mv1=scan(var,_i,';');

  _mv2=scan(var,_i+1,';');

  _m21=substr(_mv2,2,2)-substr(_mv1,2,2);

  if _m21=1 then Varfinal=cats(Varfinal,_svc);

  else if _m21>1 then

        do;

do _k=1 by 1 to (_m21-1);

           VarFinal=cats(VarFinal,'-');

          end;VarFinal=cats(VarFinal,_svc);

end;

end;

run;

proc print;run;

Frequent Contributor
Posts: 101

Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

This one is actually easier than the original problem.

data have;

input var $ 60.;

datalines4;

D01;D02;D03;D04;D05;

D01;D02;D03;D05;

A10;A11;A12;A13;A17;A20;A21;A22;

;;;;

data want (drop=_Smiley Happy;

set have;

length newvar $100 _subvar $5 _gapchar $100;

do _i = 1 by 1 until ( missing( _subvar ) );

   _subvar = scan( var, _i, ';' );

   _prefix = compress( _subvar,, 'ka' );

   _suffix = compress( _subvar,, 'kd' );

   _lagsuffix = lag( _suffix );

   _gapchar = '';

   * check for gap;

   _gap = _suffix - coalesce(_lagsuffix,0) - 1;

   if _gap > 0 then _gapchar = repeat( '-', _gap-1 );

   newvar = cats( newvar, _gapchar, _prefix );

end;

run;

Frequent Contributor
Posts: 101

Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

Here is another variation using the LAG function:

data want (drop=_Smiley Happy;

set have;

length newvar $100 _subvar $5;

_isincrement = 0;

do _i = 1 by 1 until ( missing( _subvar ) );

   _subvar = scan( var, _i, ';' );

   _prefix = compress( _subvar,, 'ka' );

   _lagprefix = lag( _prefix );

   _suffix = compress( _subvar,, 'kd' );

   _lagsuffix = lag( _suffix );

   _lagsubvar = lag( _subvar );

   if _prefix = _lagprefix and _suffix = _lagsuffix + 1 then do;

      _isincrement = 1;

   end;

   else do;

      if _isincrement then newvar = catx( '-', newvar, _lagsubvar );

      newvar = catx( ';', newvar, _subvar );

      _isincrement = 0;

   end;

end;

run;

Super User
Posts: 9,662

Re: creating D02-D05 from D02;D03;D04;D05 and more complicated things

I prefer to Linlin's solution.

data have;
input var $40.;
datalines4;
D01;D02;D03;D04;D05;
D01;D02;D03;D05;
D10;D11;D12;D13;D17;D20;D21;D22;
;;;;
run;
data temp1(keep=group d num);
 set have;
 group+1;
  i=1;
  d=substr(scan(var,i,';'),1,1);
  num=input(substr(scan(var,i,';'),2),best8.);
  do while(not missing(d));
   output;
   i+1;
   d=substr(scan(var,i,';'),1,1);
   num=input(substr(scan(var,i,';'),2),best8.);
  end;
run;
proc sort data=temp1;by group num;run;
data temp2(keep=group num);
 set temp1(rename=(num=_num));
 by group;
 if last.group then do;
                     do num=1 to _num;
                           output;
                          end;
                         end;
run;
data x;
 merge temp1 temp2;
 by group num; 
 if missing(d) then d='-';
run;
data want(keep=have want);
 set x;
 by group;
 length have want $ 400;
 retain have want;
 if first.group then call missing(want,have);
 if d ne '-' then have=catx(';',have,cats(d,num));
 want=cats(want,d);
 if last.group then output;
run;


Ksharp

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 602 views
  • 0 likes
  • 4 in conversation