DATA Step, Macro, Functions and more

Split h1-h10 into h1,h2,h3,h4,.... h10

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

Split h1-h10 into h1,h2,h3,h4,.... h10

I want to translate the contents of a column.

For example, the content includes the following value (this is the content of 1 column in 1 row):

G1, G2, G4, X1 -X3, H51-H55, T10

I want to translate that to different rows.

The result should look like this:

G1

G2

G4

X1

X2

X3

H51

H52

H53

H54

H55

T10

I have not problems making rows of the single ones (I mean G1 G2 G4 and T10). The function scan works perfectly.

The problem is how I translate X1-X3 into X1, X2, X3 and the same for H51-H55.


Accepted Solutions
Solution
‎09-02-2013 01:21 PM
Respected Advisor
Posts: 3,156

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Posted in reply to data_null__

Yes, DN, can never go under your radar -Smiley Happy. ANYFIRST should have a shot, while at this stage it seems to me using PRX is more straightforward.

data test;

input var $ 80.;

length new_var $ 20;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

   if find(_var,'-') then do;

     _v1=scan(_var,1,'- ');

  _v2=scan(_var,2,'- ');

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

           new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);

           output;

     end;

  end;

   else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103

;

Haikuo

View solution in original post


All Replies
Respected Advisor
Posts: 3,156

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Posted in reply to Richardvan_tHoff

With a combination of bunch of Char functions, we can have something like this:

data test;

input var $ 50.;

length new_var $ 8;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

   if find(_var,'-') then

     do _i=compress(scan(_var,1,'-'),,'kd') to compress(scan(_var,2,'-'),,'kd');

           new_var=cats(compress(scan(_var,1,'-'),,'d'),_i);

           output;

     end;

   else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10

;

Haikuo

Respected Advisor
Posts: 3,799

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

What if a word has an imbedded digit? NA3ME1-NA3ME1

Consider using the ANYFIRST function with negative start.

Solution
‎09-02-2013 01:21 PM
Respected Advisor
Posts: 3,156

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Posted in reply to data_null__

Yes, DN, can never go under your radar -Smiley Happy. ANYFIRST should have a shot, while at this stage it seems to me using PRX is more straightforward.

data test;

input var $ 80.;

length new_var $ 20;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

   if find(_var,'-') then do;

     _v1=scan(_var,1,'- ');

  _v2=scan(_var,2,'- ');

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

           new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);

           output;

     end;

  end;

   else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103

;

Haikuo

Occasional Contributor
Posts: 12

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Thanks all. I am using the code already from hai.kuo. Only made a minor change, because the variable could also contain only numbers.

data test;

input var $ 80.;

length new_var $ 20;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

   if find(_var,'-') then do;

     _v1=scan(_var,1,'- ');

     _v2=scan(_var,2,'- ');

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

         IF(prxmatch("/\D+/",strip(_v1))) THEN

           new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);

         ELSE

           new_var=_i;

           output;

     end;

  end;

   else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103, 1-10

;

The only problem I am trying to solve is when you have H01-H10. The above code would split it in the below rows

H1

H2

H3

H4

...

H10

While it should be

H01

H02

H03

H04

...

H10

Super Contributor
Posts: 339

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Posted in reply to Richardvan_tHoff

That's actually fairly tricky to add within a single data step since you can't get to enforce a zn.w format. I see 2 alternatives

1: Use PRX to capture number of leading 0s from your lower boundary and then control the number of those that are output based on your _i counter (like remove one 0 from the captured lead for each increment in log10(_i)). It would resemble something like this (untested)

length lead $20.; /*just so that prxchange output does not get converted to numeric*/

...

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

         IF(prxmatch("/\D+/",strip(_v1))) THEN

            lead=prxchange('s/(\w*\D+)(0*)(\d+)/$2/',-1,_v1); /*you could change \d+ to [1-9]\d+ or something to clearly identify the purpose of 0* but since * is greedy its only a minor efficiency issue */

           new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),  length), substrn(left(lead), 1, length(trim(left(lead)))-(floor(log10(_i))-floor(log10(prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)))) )            ,_i);


...

drop lead;

2: Another way would build a series of case based on the length of the fully captured digit as a string for the lower bound and use put& zn.w formats e.g.

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

         IF(prxmatch("/\D+/",strip(_v1))) THEN

               zlen = length(prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1));

          if zlen=1 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);

          else if zlen=2 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),put(_i, z2.);

          ...

          else if zlen=8 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),put(_i, z8.));

PROC Star
Posts: 7,492

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Posted in reply to Richardvan_tHoff

: The method I suggested will retain the leading zeros if present in the variable list.

Respected Advisor
Posts: 3,799

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Arthur Tabachneck wrote:

Richard van't Hoff: The method I suggested will retain the leading zeros if present in the variable list.

I think there will be a problem defining variables with name like (at least in an input statement).

34WE100-34WE103, 1-10

I could not get that to work without making the words name literials.  Of course it wouldn't be too hard to add some code to convert those words to name literials.

Respected Advisor
Posts: 3,799

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Posted in reply to data_null__

Maybe this.

options validvarname=any;
%let list = G1, G2, G4, X01 -X03, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103 1-10 009-005;

data _null_;
  
length list $256 w $32 newlist $256;
  
list = transtrn(translate(symget('LIST'),' ',','),'-',' - ');
   do i = 1 by 1;
      w = scan(list,i,
' ');
      if missing(w) then leave;
      if w ne '-' then w = nliteral(w);
      newlist = catx(' ',newlist,w);
      end;
  
call symputx('LIST',newlist);
   put _all_;
  
run;

data names;
   input &list;
   stop;
  
cards;
;;;;
  
run;

proc transpose data=names out=names2;
   var _all_;
   run;
proc print;
  
run;
PROC Star
Posts: 7,492

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Posted in reply to data_null__

I didn't even notice the 1-10.  Hopefully, that was just a typo by the OP.  I was simply addressing the case like T01-T10.

Occasional Contributor
Posts: 12

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Thanks all.

I finally got it working for everything (even for my last problem).

My solution

data work.testje;

input var $ 100.;

length new_var $ 20;

do _j=1 to countw(var,',');

  

   _var=scan(var,_j,',');

   if find(_var,'-') then do;

     _v1=scan(_var,1,'- ');

     _v2=scan(_var,2,'- ');

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

    _len_new=LENGTH(STRIP(_i));

    _len_old=LENGTH(SUBSTR(STRIP(_v1),(LENGTH(STRIP(_v1))-ANYALPHA(STRIP(REVERSE(_v1)))) +2));

    _new_i=REPEAT('0',_len_old - _len_new - 1)||STRIP(_i);

    _turned=REVERSE(STRIP(_v1));

    SUBSTR(_turned,1,_len_old)=REVERSE(STRIP(_new_i));

    new_var=REVERSE(STRIP(_turned));

       output;

     end;

  end;

   else do;new_var=left(_var);output;end;

end;

cards;

G1, G2, G4, X1 -X3, H051-H055, T10,NA03ME1-NA03ME3, 34WE100-34WE103,001-008, 1-10, H0001- H0010

Respected Advisor
Posts: 3,799

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Posted in reply to Richardvan_tHoff

These look like variable lists.  If they are you might be able to use that to your advantage.

PROC Star
Posts: 7,492

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Posted in reply to Richardvan_tHoff

I agree with datanull.  If they are variable lists, you could use something like the following to convert them into a macro variable and then scan the macro variable.  e.g.:

data have;

  informat contents $80.;

  input contents &;

  cards;

G1, G2, G4, X1 -X3, H51-H55, T10

;

data _null_;

  set have;

  contents=translate(contents,' ',',');

  call symput('varlist',contents);

run;

data test;

  input &varlist.;

  cards;

;

run;

proc sql noprint;

  select name

    into :vars separated by " "

      from dictionary.columns

        where libname="WORK" and

              memname="TEST"

  ;

quit;

%put &vars.;

Super User
Posts: 5,518

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Art,

Got this from Tom, who got it from DN, who got it elsewhere ...

An alternative ending to your solution maintains the order of the names, and produces output as a single column.  Instead of reading from DICTIONARY.COLUMNS, try:

proc transpose data=test (obs=0) out=want;

   var &varlist;

run;

PROC Star
Posts: 7,492

Re: Split h1-h10 into h1,h2,h3,h4,.... h10

Posted in reply to Astounding

: Yes, I was familiar with the method, but not sure of its origin either.  Tom, Randy Herbison, FriedEgg and I have a nifty use of it in a paper we're presenting at the MWSUG meeting later this month.  Take a look at: http://www.sascommunity.org/mwiki/images/c/c5/S1-14-2013.pdf

However, I usually like the SQL method more often as one can order the variables based on any variable that is contained in dictionary.columns, including name and column number.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 444 views
  • 0 likes
  • 6 in conversation