SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How to split a column into multiple columns based on string delimiting

Accepted Solution Solved
Reply
Contributor
Posts: 48
Accepted Solution

How to split a column into multiple columns based on string delimiting

Hi SAS Community,

 

I would like your help insolving the below problem.

Dataset in question:-

 

Product codeClass of drug
a1

abc [EPC],

def [MoA]

a2

ghi [EPC],

jkl  [MoA]

a3

mno [EPC],

pqr [EPC],

abo [MoA]

a4

stu [EPC],

vw [CI],

xyz [MoA]

 

Expected output:-

 

PRODUCTNDCPHARM_CLASSESEPC1EPC2 MoA1 MoA2 CI
a1

abc [EPC],

def [MoA]

abc  [EPC]. def[MoA] . .
a2

ghi [EPC],

jkl  [MoA]

ghi[EPC]. jkl[MoA] . .
a3

mno [EPC],

pqr [EPC],

abo [MoA]

mno[EPC]pqr [EPC] abo[MoA] . .
a4

stu [EPC],

vw [CI],

xyz [MoA]

stu [EPC].vw[MoA] . xyz [CI]

 

I tried using Scan option in a data statement and there was no success. I also used ',' as a dlm to split the variable but the position of EPC, MoA and CI is not same across the dataset.

 

Please Help.

 

 


Accepted Solutions
Solution
‎09-22-2017 03:35 PM
Super User
Super User
Posts: 7,932

Re: How to split a column into multiple columns based on string delimiting

First split the column into multiple rows. You can parse out the text in the square brackets. Generate a counter so that PROC TRANSPOSE can name the columns.

data have ;
  length code $8 list $100 ;
  infile cards dsd dlm='|' truncover ;
  input code list ;
cards;
a1|abc [EPC],def [MoA]
a2|ghi [EPC],jkl  [MoA]
a3|mno [EPC],pqr [EPC],abo [MoA]
a4|stu [EPC],vw [CI],xyz [MoA]
;

data middle;
  set have ;
  length term $10 tag $3 ;
  do i=1 by 1 until (term=' ');
    term=strip(scan(list,i,','));
    tag=scan(term,2,'[]');
    if term ne ' ' then output;
  end;
run;

proc sort data=middle ;
  by code tag ;
run;

data middle2 ;
  set middle ;
  by code tag ;
  rep+1;
  if first.tag then rep=1;
run;

proc transpose data=middle2 out=want(drop=_name_) ;
  by code ;
  id tag rep ;
  var term ;
run;

image.png

View solution in original post


All Replies
Solution
‎09-22-2017 03:35 PM
Super User
Super User
Posts: 7,932

Re: How to split a column into multiple columns based on string delimiting

First split the column into multiple rows. You can parse out the text in the square brackets. Generate a counter so that PROC TRANSPOSE can name the columns.

data have ;
  length code $8 list $100 ;
  infile cards dsd dlm='|' truncover ;
  input code list ;
cards;
a1|abc [EPC],def [MoA]
a2|ghi [EPC],jkl  [MoA]
a3|mno [EPC],pqr [EPC],abo [MoA]
a4|stu [EPC],vw [CI],xyz [MoA]
;

data middle;
  set have ;
  length term $10 tag $3 ;
  do i=1 by 1 until (term=' ');
    term=strip(scan(list,i,','));
    tag=scan(term,2,'[]');
    if term ne ' ' then output;
  end;
run;

proc sort data=middle ;
  by code tag ;
run;

data middle2 ;
  set middle ;
  by code tag ;
  rep+1;
  if first.tag then rep=1;
run;

proc transpose data=middle2 out=want(drop=_name_) ;
  by code ;
  id tag rep ;
  var term ;
run;

image.png

Contributor
Posts: 48

Re: How to split a column into multiple columns based on string delimiting

Hi Tom,

 

Thank you for your quick response. I really appreciate your solution. I have one small clarification. The data presented is a sample of the actual excel file and it doesnot have any delimiters. There are 13,000 observations in the dataset. Presenting this case I dont think I can use the infile delimiter statement for this case.

 

I am attaching another sample with limited observations.

 

I appreciate your help.

 

Thank you very much.

Super User
Posts: 13,304

Re: How to split a column into multiple columns based on string delimiting

[ Edited ]

shasank wrote:

Hi Tom,

 

Thank you for your quick response. I really appreciate your solution. I have one small clarification. The data presented is a sample of the actual excel file and it doesnot have any delimiters. There are 13,000 observations in the dataset. Presenting this case I dont think I can use the infile delimiter statement for this case.

 

I am attaching another sample with limited observations.

 

I appreciate your help.

 

Thank you very much.


 

The infile in the provided example was because you did not provide example data in the form we could manipulate. So @Tom created a small example set to demostrate the technique. Your existing data would be used by  replacing "have" with the name of your SAS data set.

 

Also XLSX files are very poor for providing example data. Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

Super User
Super User
Posts: 7,932

Re: How to split a column into multiple columns based on string delimiting

The first data step was just to create something for the rest of the code to work on.  You should start from the MIDDLE datastep and provide your actual input dataset name. You will need to modify the code to reflect your actual variable names. Also modify the lengths of the derived TERM and TAG variables to be long enough to handle your actual data.

Contributor
Posts: 48

Re: How to split a column into multiple columns based on string delimiting

Thank you for the reply. I have been working on it since my last reply.

 

Thank you for the help.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 1791 views
  • 0 likes
  • 3 in conversation