BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
marianob
Obsidian | Level 7

Hello community!

I would like some help with this. I somehow did it manually, but I would want to learn how to do it with a more sophisticated code.

I need to split this grid into columns (up to 12 columns), with the following conditions:

 - If it is a "C" or a "-" it must go in a single column.

 - If it is a number, example "03" "09" "23", it must go in a single column.

 

Example:

 

ObsGRIDColumn1Column2Column3Column4Column 12
113131313130706050403--01CCCCCCCC13131313  
2CCCCCCCCCCCCCCCCCCCCCCCCCC  
31313023413130234  
41313135413131354  
5CC020101CCC01CC01CC01CC--CC0201  
6--C20133--C20  
7C0201CCC0201C  
8C030201CC030201  

 

Thanks in advance!!

1 ACCEPTED SOLUTION

Accepted Solutions
JeffMeyers
Barite | Level 11

Here's how I did it with part of your data:


data blah;
length x $200.;
x='13131313130706050403--01CCCCCCCC';output;
x='CCCCCCCCCCCCCCCCCCCCCC'; output;
x='13130234';output;
run;

data blah2;
set blah end=last;
    
    n_c=count(x,'C');
    n_dash=count(x,'-');
    if ^missing(compress(x,'-C')) then n_num=length(compress(x,'-C'))/2;
    else n_num=0;
    retain n_columns;
    if sum(n_c,n_dash,n_num)>n_columns then n_columns=sum(n_c,n_dash,n_num);
    
    if last then call symputx('ncol',n_columns);
run;


data blah3;
    set blah2;
    array column {&ncol} $2.;
    
    col=0;
    indx=1;
    do until (indx>length(x));
        if substr(x,indx,1) in('C' '-') then do;
            col=col+1;
            column(col)=substr(x,indx,1);
            indx=indx+1;
        end;
        else do;
            col=col+1;
            column(col)=substr(x,indx,2);
            indx=indx+2;
        end;
        end;
    drop col indx n_c n_dash n_num;
run;
        

The first parts are figuring out how large the array needs to be to hold all of the columns.  The last part is filling in the array by incrementing a substring function through and increasing the index based on the character it comes across.  This assumes all numbers are two digits.

View solution in original post

13 REPLIES 13
JeffMeyers
Barite | Level 11

Are all numbers two digits?

marianob
Obsidian | Level 7

Hi!

Yes, as the ones between 1 and 9 have leading 0s.

JeffMeyers
Barite | Level 11

Here's how I did it with part of your data:


data blah;
length x $200.;
x='13131313130706050403--01CCCCCCCC';output;
x='CCCCCCCCCCCCCCCCCCCCCC'; output;
x='13130234';output;
run;

data blah2;
set blah end=last;
    
    n_c=count(x,'C');
    n_dash=count(x,'-');
    if ^missing(compress(x,'-C')) then n_num=length(compress(x,'-C'))/2;
    else n_num=0;
    retain n_columns;
    if sum(n_c,n_dash,n_num)>n_columns then n_columns=sum(n_c,n_dash,n_num);
    
    if last then call symputx('ncol',n_columns);
run;


data blah3;
    set blah2;
    array column {&ncol} $2.;
    
    col=0;
    indx=1;
    do until (indx>length(x));
        if substr(x,indx,1) in('C' '-') then do;
            col=col+1;
            column(col)=substr(x,indx,1);
            indx=indx+1;
        end;
        else do;
            col=col+1;
            column(col)=substr(x,indx,2);
            indx=indx+2;
        end;
        end;
    drop col indx n_c n_dash n_num;
run;
        

The first parts are figuring out how large the array needs to be to hold all of the columns.  The last part is filling in the array by incrementing a substring function through and increasing the index based on the character it comes across.  This assumes all numbers are two digits.

marianob
Obsidian | Level 7

Thank you so much!

 

Just to learn it properly. I don't understand this part of the code. What is it for?

 

if ^missing(compress(x,'-C')) then n_num=length(compress(x,'-C'))/2;

 

Thanks again!

JeffMeyers
Barite | Level 11
If you take the length of a missing character it still returns 1. So what that logic statement is doing is checking first to see if there is any non-missing value after removing the C and - characters (COMPRESS function removes them). If it is not missing then the length function will tell us how many numbers are left (length of the string after removing the C and - characters). Otherwise the ELSE statement marks there as 0 numbers.
ChrisNZ
Tourmaline | Level 20

@JeffMeyers > take the length of a missing character it still returns 1.

Use the lengthn() function.

JeffMeyers
Barite | Level 11
Thanks, always good to learn something new.
ballardw
Super User

I think that you need to explicitly show us what the entire result for row 6 might be.

marianob
Obsidian | Level 7

Sorry, I manually modified that one in excel (leading 0 in the last "3" was missing). The correct grid for row 6 is:

 

6--C201303--C201303
ChrisNZ
Tourmaline | Level 20

Like this?

data WANT;
  set HAVE;
  array COL [12] $2; 
  POS=1;
  do COLNO = 1 to 12 while (char(GRID,POS) ne ' ');
    if '0' <=  char(GRID,POS) <= '9' then do;
      COL[COLNO] = substr(GRID,POS,2);
      POS+2;
    end;
    else do;
      COL[COLNO] = char(GRID,POS);
      POS+1;
    end;
  end;
run;
GRID COL1 COL2 COL3 COL4 COL5 COL6 COL7 COL8 COL9 COL10 COL11 COL12
13131313130706050403--01CCCCCCCC 13 13 13 13 13 07 06 05 04 03 - -
CCCCCCCCCCCCCCCCCCCCCC C C C C C C C C C C C C
13130234 13 13 02 34                
--C0234 - - C 02 34              

 

marianob
Obsidian | Level 7

It worked perfectly fine! Thanks!

r_behata
Barite | Level 11

Regular expression solution :

data have;
input grid : $50. ;
cards;
3131313130706050403--01CCCCCCCC
CCCCCCCCCCCCCCCCCCCCCC
13130234
13131354
CC020101CCC01CC01CC01CC--
--C20133
C0201CC
C030201C
;
run;

data want(rename=(grid_=grid));
length grid_ $50;
	set have;

/*REGEX pattern explained*/
 /*/(\d{2}|\w{1}|\W{1})/io*/
/*1st Capturing Group (\d{2}|\w{1}|\W{1})*/
/*1st Alternative \d{2}*/
/*\d{2} matches a digit (equal to [0-9])*/
/*{2} Quantifier — Matches exactly 2 times*/
/*2nd Alternative \w{1}*/
/*\w{1} matches any word character (equal to [a-zA-Z0-9_])*/
/*{1} Quantifier — Matches exactly one time (meaningless quantifier)*/
/*3rd Alternative \W{1}*/
/*\W{1} matches any non-word character (equal to [^a-zA-Z0-9_])*/
/*{1} Quantifier — Matches exactly one time */
 patternID = prxparse('/\d{2}|\w{1}|\W{1}/io');

   array col[*] $ col1-col12;
   grid_=grid;
   do i=1 to length(grid);
		call prxsubstr(patternID,grid,pos,len);
		col[i]=substr(grid,pos,len);
		grid=substr(grid,pos+len);
		if i eq 12 then leave;
	end;
drop grid patternID pos len i;
run;

marianob
Obsidian | Level 7
Many thanks!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1144 views
  • 5 likes
  • 5 in conversation