Re: Splitting a delimited column into multiple columns in order

Claudia_SAS · Posted 05-20-2020 12:07 AM

Hello,

I have a table which contains 2 columns:

What I want is to split the second column into multiple columns(my delimiter is space) but in order.

Therefore my new table should look just like this:

Country OPT1 OPT2 OTP3 OTP4 OPT5............. OPTn

US #02 #05 #38 Z82

US #02 #16 #33 ZY1

.

US 1SZ

Any ideas?

Thank you,

Claudia.

ChicoFTB · Posted 05-20-2020 08:56 AM

Claudia, this post might be helpful!

https://communities.sas.com/t5/SAS-Procedures/Splitting-a-delimited-column-into-multiple-columns/td-...

Claudia_SAS · Posted 05-20-2020 09:15 AM

My dataset is string, and I have no numeric variables.
I need to keep the # or 0 when leads the variable.

Tom · Posted 05-20-2020 09:19 AM

@Claudia_SAS wrote:
My dataset is string, and I have no numeric variables.
I need to keep the # or 0 when leads the variable.

Please try the logic in the linked answer on your data there is nothing in that answer that has anything to do with numeric variables. Post the SAS log if you have any problems.

Claudia_SAS · Posted 05-20-2020 11:01 AM

My code is:

/****************************************************

data split;
input rpo $743.;
datalines;

;
data split2 (drop=rpo i);
set test;
length opt1-opt186 $3;
array opt(186) $;
do i = 1 to dim(opt);
opt[i]=scan(rpo,i,' ','M');
end;
run;

/******************************************************

But, this is not what i was looking for

What I am looking for is to put variable "#AA" in a different column and not under the same column as "06D"

I don't know if I make any sense....I was trying to many ways and nothing came the way I wanted.

In the end I want to transpose the columns as leading headers and when I have a blank that should be counted as "0" and when I have a value that should be counted as "1".

Something like this...

Country	#AA	06D	1D3	1SZ	2LZ	4AA	6X1
US	0	1	1	1	1	1	1
US	0	1	1	1	1	1	1
US	1	1	1	1	1	1	0

Please help!

ballardw · Posted 05-20-2020 11:00 AM

Do you know what the number of OPT variables is that needs to be created? What is the length of the longest OPT value?

Do you expect the same OPT value to be in the same OPT variable for each record?

Your posted "example" of what you seems to imply that each value should be in the same variable. Which leads to the question of just how many VALUES need to be considered? Do you have a list of them somewhere, like another data set?

Claudia_SAS · Posted 05-20-2020 11:13 AM

thank you for your reply .....yes...I calculated the length of the rpo column, and then the length of opt(i) always equals to 3, delimiter is always one space.

Claudia_SAS · Posted 05-20-2020 11:16 AM

I should've mentioned that the length of rpo column is not the same for each row, in my case I picked the highest number and created as many columns I needed.

ballardw · Posted 05-20-2020 11:52 AM

@Claudia_SAS wrote:
thank you for your reply .....yes...I calculated the length of the rpo column, and then the length of opt(i) always equals to 3, delimiter is always one space.

Not a trivial exercise but not extremely difficult. However a more important question may be what you will do with all of those variables when created. For many types of analysis a single variable with different values is sufficient.

So here are the steps involved to one approach:

1) get a list of ALL of the codes involved. If you have others appear in a later data step that will be a serious complication to your "in order". If you do not have such a list already and have to create one from your existing data the first step is go from one wide variable to a bunch of single values.

2) create a temporary array in a data step with the values

3) identify which order each value in the list has from the temporary array

4) assign the variable the value.

Here is one way. Note the example data is small because I am not going to TYPE from pictures a larger number of values.

data start;
   infile datalines truncover;
   input country $ opt $20.;
datalines;
US #02 #05 #38 Z82
US #02 #16 #33 ZY1
;
/* get each code value as a single record*/
data temp;
   set start;
   length o $ 3;
   do i=1 to countw(opt);
      o = scan(opt,i);
      output;
   end;
   keep o;
run;
/* create ordered list of codes suitable for use in 
   temporary array definition
*/
Proc sql noprint;
   select distinct quote(o) into: codelist separated by ','
   from temp;
quit;
%let codecount=&sqlobs;

%put list of codes = &codelist.;
%put number of codes = &codecount.;

data want;
   set start;
   array opt_(&codecount.) $ 3 opt1 - opt&codecount. ;
   array t(&codecount.) $ 3 _temporary_( &codelist);
   do i= 1 to countw(opt);
      opt_(whichc(scan(opt,i),of t(*))) = scan(opt,i);
   end;
   drop i;
run;

Some key elements. The temp data set gets all of the values as a single record per value.

The Proc sql select distinct will get the unique values, and by default in an order. If you expect the aphabetic Z82 to appear before #02 that will add some work. There are two macro variables created that you can examine in the log that will be used to write part of the code needed for the last step. The &sqlobs is an automatic SAS variable that returns the number of observations used in the immediately previous Proc SQL step. Assign the value to a known variable so you can retrieve the value later.

The macro variable codecount has how many codes are appear in the data. This is the issue if you run this later with other data that the value of OPTx may change because of the sorting if new code values are added.

In the last data step we use the the codecount to set the size of arrays and create the variable names in the array statement.

The marginally ugly code in the do loop involving WHICHC is a function that finds the position of a value in the list of values. In this case the list of values is the temporary array T and the "of t(*)" say to search all of the values in the array. It returns the number of the found element.

Another approach would be create numeric 0/1 coded variables assigning labels to the variables Opt1 to Optn of the value represented, with 1 indicating "found" and 0 "not found". Which coding makes more sense depends on how you intend to use the data.

Claudia_SAS · Posted 05-20-2020 01:53 PM

Actually I did an "orthodox" code and it worked just perfect.
data split;
input rpo $743.;
datalines;

;
data split2 (drop=rpo i);
set test;
length opt1-opt186 $3;
array opt(186) $;
do i = 1 to dim(opt);
opt[i]=scan(rpo,i,' ','M');
end;
run;
proc transpose data=split2 out=split3(drop=_NAME_ rename=(Col1_=opt));
by VIN ;
VAR opt1 opt2 opt3 opt4 opt5 opt6 opt7 opt8 opt9 opt10 opt11 opt12 opt13 opt14 opt15 opt16 opt17 opt18 opt19 opt20 opt21 opt22 opt23 opt24 opt25 opt26 opt27 opt28 opt29 opt30 opt31 opt32 opt33 opt34 opt35 opt36 opt37 opt38 opt39 opt40 opt41 opt42
opt43 opt44 opt45 opt46 opt47 opt48 opt49 opt50 opt51 opt52 opt53 opt54 opt55 opt56 opt57 opt58 opt59 opt60 opt61 opt62 opt63 opt64 opt65 opt66 opt67 opt68 opt69 opt70 opt71 opt72 opt73 opt74 opt75 opt76 opt77 opt78 opt79 opt80 opt81 opt82 opt83
opt84 opt85 opt86 opt87 opt88 opt89 opt90 opt91 opt92 opt93 opt94 opt95 opt96 opt97 opt98 opt99 opt100 opt101 opt102 opt103 opt104 opt105 opt106 opt107 opt108 opt109 opt110 opt111 opt112 opt113 opt114 opt115 opt116 opt117 opt118 opt119 opt120
opt121 opt122 opt123 opt124 opt125 opt126 opt127 opt128 opt129 opt130 opt131 opt132 opt133 opt134 opt135 opt136 opt137 opt138 opt139 opt140 opt141 opt142 opt143 opt144 opt145 opt146 opt147 opt148 opt149 opt150 opt151 opt152 opt153 opt154 opt155
opt156 opt157 opt158 opt159 opt160 opt161 opt162 opt163 opt164 opt165 opt166 opt167 opt168 opt169 opt170 opt171 opt172 opt173 opt174 opt175 opt176 opt177 opt178 opt179 opt180 opt181 opt182 opt183 opt184 opt185 opt186;

run;

data split4;
set split3;
value = 1;
run;

proc transpose data=split4 out=want;
by VIN;
id opt;
var value;
run;

proc stdize data=want out=want missing=0 reponly;
run;

Registration is open

SAS Training: Just a Click Away