BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Richardvan_tHoff
Obsidian | Level 7

I want to translate the contents of a column.

For example, the content includes the following value (this is the content of 1 column in 1 row):

G1, G2, G4, X1 -X3, H51-H55, T10

I want to translate that to different rows.

The result should look like this:

G1

G2

G4

X1

X2

X3

H51

H52

H53

H54

H55

T10

I have not problems making rows of the single ones (I mean G1 G2 G4 and T10). The function scan works perfectly.

The problem is how I translate X1-X3 into X1, X2, X3 and the same for H51-H55.

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Yes, DN, can never go under your radar -:). ANYFIRST should have a shot, while at this stage it seems to me using PRX is more straightforward.

data test;

input var $ 80.;

length new_var $ 20;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

   if find(_var,'-') then do;

     _v1=scan(_var,1,'- ');

  _v2=scan(_var,2,'- ');

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

           new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);

           output;

     end;

  end;

   else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103

;

Haikuo

View solution in original post

14 REPLIES 14
Haikuo
Onyx | Level 15

With a combination of bunch of Char functions, we can have something like this:

data test;

input var $ 50.;

length new_var $ 8;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

   if find(_var,'-') then

     do _i=compress(scan(_var,1,'-'),,'kd') to compress(scan(_var,2,'-'),,'kd');

           new_var=cats(compress(scan(_var,1,'-'),,'d'),_i);

           output;

     end;

   else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10

;

Haikuo

data_null__
Jade | Level 19

What if a word has an imbedded digit? NA3ME1-NA3ME1

Consider using the ANYFIRST function with negative start.

Haikuo
Onyx | Level 15

Yes, DN, can never go under your radar -:). ANYFIRST should have a shot, while at this stage it seems to me using PRX is more straightforward.

data test;

input var $ 80.;

length new_var $ 20;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

   if find(_var,'-') then do;

     _v1=scan(_var,1,'- ');

  _v2=scan(_var,2,'- ');

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

           new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);

           output;

     end;

  end;

   else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103

;

Haikuo

Richardvan_tHoff
Obsidian | Level 7

Thanks all. I am using the code already from hai.kuo. Only made a minor change, because the variable could also contain only numbers.

data test;

input var $ 80.;

length new_var $ 20;

do _j=1 to countw(var,',');

_var=scan(var,_j,',');

   if find(_var,'-') then do;

     _v1=scan(_var,1,'- ');

     _v2=scan(_var,2,'- ');

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

         IF(prxmatch("/\D+/",strip(_v1))) THEN

           new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);

         ELSE

           new_var=_i;

           output;

     end;

  end;

   else do;new_var=left(_var);output;end;

end;

drop _:;

cards;

G1, G2, G4, X1 -X3, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103, 1-10

;

The only problem I am trying to solve is when you have H01-H10. The above code would split it in the below rows

H1

H2

H3

H4

...

H10

While it should be

H01

H02

H03

H04

...

H10

Vince28_Statcan
Quartz | Level 8

That's actually fairly tricky to add within a single data step since you can't get to enforce a zn.w format. I see 2 alternatives

1: Use PRX to capture number of leading 0s from your lower boundary and then control the number of those that are output based on your _i counter (like remove one 0 from the captured lead for each increment in log10(_i)). It would resemble something like this (untested)

length lead $20.; /*just so that prxchange output does not get converted to numeric*/

...

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

         IF(prxmatch("/\D+/",strip(_v1))) THEN

            lead=prxchange('s/(\w*\D+)(0*)(\d+)/$2/',-1,_v1); /*you could change \d+ to [1-9]\d+ or something to clearly identify the purpose of 0* but since * is greedy its only a minor efficiency issue */

           new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),  length), substrn(left(lead), 1, length(trim(left(lead)))-(floor(log10(_i))-floor(log10(prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)))) )            ,_i);


...

drop lead;

2: Another way would build a series of case based on the length of the fully captured digit as a string for the lower bound and use put& zn.w formats e.g.

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

         IF(prxmatch("/\D+/",strip(_v1))) THEN

               zlen = length(prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1));

          if zlen=1 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),_i);

          else if zlen=2 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),put(_i, z2.);

          ...

          else if zlen=8 then new_var=cats(prxchange('s/(\w*\D+)(\d+)/$1/',-1,_v1),put(_i, z8.));

art297
Opal | Level 21

: The method I suggested will retain the leading zeros if present in the variable list.

data_null__
Jade | Level 19

Arthur Tabachneck wrote:

Richard van't Hoff: The method I suggested will retain the leading zeros if present in the variable list.

I think there will be a problem defining variables with name like (at least in an input statement).

34WE100-34WE103, 1-10

I could not get that to work without making the words name literials.  Of course it wouldn't be too hard to add some code to convert those words to name literials.

data_null__
Jade | Level 19

Maybe this.

options validvarname=any;
%let list = G1, G2, G4, X01 -X03, H51-H55, T10,NA3ME1-NA3ME3, 34WE100-34WE103 1-10 009-005;

data _null_;
  
length list $256 w $32 newlist $256;
  
list = transtrn(translate(symget('LIST'),' ',','),'-',' - ');
   do i = 1 by 1;
      w = scan(list,i,
' ');
      if missing(w) then leave;
      if w ne '-' then w = nliteral(w);
      newlist = catx(' ',newlist,w);
      end;
  
call symputx('LIST',newlist);
   put _all_;
  
run;

data names;
   input &list;
   stop;
  
cards;
;;;;
  
run;

proc transpose data=names out=names2;
   var _all_;
   run;
proc print;
  
run;
art297
Opal | Level 21

I didn't even notice the 1-10.  Hopefully, that was just a typo by the OP.  I was simply addressing the case like T01-T10.

Richardvan_tHoff
Obsidian | Level 7

Thanks all.

I finally got it working for everything (even for my last problem).

My solution

data work.testje;

input var $ 100.;

length new_var $ 20;

do _j=1 to countw(var,',');

  

   _var=scan(var,_j,',');

   if find(_var,'-') then do;

     _v1=scan(_var,1,'- ');

     _v2=scan(_var,2,'- ');

     do _i=prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v1)  to prxchange('s/(\w*\D+)(\d+)/$2/',-1,_v2);

    _len_new=LENGTH(STRIP(_i));

    _len_old=LENGTH(SUBSTR(STRIP(_v1),(LENGTH(STRIP(_v1))-ANYALPHA(STRIP(REVERSE(_v1)))) +2));

    _new_i=REPEAT('0',_len_old - _len_new - 1)||STRIP(_i);

    _turned=REVERSE(STRIP(_v1));

    SUBSTR(_turned,1,_len_old)=REVERSE(STRIP(_new_i));

    new_var=REVERSE(STRIP(_turned));

       output;

     end;

  end;

   else do;new_var=left(_var);output;end;

end;

cards;

G1, G2, G4, X1 -X3, H051-H055, T10,NA03ME1-NA03ME3, 34WE100-34WE103,001-008, 1-10, H0001- H0010

data_null__
Jade | Level 19

These look like variable lists.  If they are you might be able to use that to your advantage.

art297
Opal | Level 21

I agree with datanull.  If they are variable lists, you could use something like the following to convert them into a macro variable and then scan the macro variable.  e.g.:

data have;

  informat contents $80.;

  input contents &;

  cards;

G1, G2, G4, X1 -X3, H51-H55, T10

;

data _null_;

  set have;

  contents=translate(contents,' ',',');

  call symput('varlist',contents);

run;

data test;

  input &varlist.;

  cards;

;

run;

proc sql noprint;

  select name

    into :vars separated by " "

      from dictionary.columns

        where libname="WORK" and

              memname="TEST"

  ;

quit;

%put &vars.;

Astounding
PROC Star

Art,

Got this from Tom, who got it from DN, who got it elsewhere ...

An alternative ending to your solution maintains the order of the names, and produces output as a single column.  Instead of reading from DICTIONARY.COLUMNS, try:

proc transpose data=test (obs=0) out=want;

   var &varlist;

run;

art297
Opal | Level 21

: Yes, I was familiar with the method, but not sure of its origin either.  Tom, Randy Herbison, FriedEgg and I have a nifty use of it in a paper we're presenting at the MWSUG meeting later this month.  Take a look at: http://www.sascommunity.org/mwiki/images/c/c5/S1-14-2013.pdf

However, I usually like the SQL method more often as one can order the variables based on any variable that is contained in dictionary.columns, including name and column number.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 2465 views
  • 0 likes
  • 6 in conversation