BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shasank
Quartz | Level 8

Hi SAS Community,

 

I would like your help insolving the below problem.

Dataset in question:-

 

Product codeClass of drug
a1

abc [EPC],

def [MoA]

a2

ghi [EPC],

jkl  [MoA]

a3

mno [EPC],

pqr [EPC],

abo [MoA]

a4

stu [EPC],

vw [CI],

xyz [MoA]

 

Expected output:-

 

PRODUCTNDCPHARM_CLASSESEPC1EPC2 MoA1 MoA2 CI
a1

abc [EPC],

def [MoA]

abc  [EPC]. def[MoA] . .
a2

ghi [EPC],

jkl  [MoA]

ghi[EPC]. jkl[MoA] . .
a3

mno [EPC],

pqr [EPC],

abo [MoA]

mno[EPC]pqr [EPC] abo[MoA] . .
a4

stu [EPC],

vw [CI],

xyz [MoA]

stu [EPC].vw[MoA] . xyz [CI]

 

I tried using Scan option in a data statement and there was no success. I also used ',' as a dlm to split the variable but the position of EPC, MoA and CI is not same across the dataset.

 

Please Help.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

First split the column into multiple rows. You can parse out the text in the square brackets. Generate a counter so that PROC TRANSPOSE can name the columns.

data have ;
  length code $8 list $100 ;
  infile cards dsd dlm='|' truncover ;
  input code list ;
cards;
a1|abc [EPC],def [MoA]
a2|ghi [EPC],jkl  [MoA]
a3|mno [EPC],pqr [EPC],abo [MoA]
a4|stu [EPC],vw [CI],xyz [MoA]
;

data middle;
  set have ;
  length term $10 tag $3 ;
  do i=1 by 1 until (term=' ');
    term=strip(scan(list,i,','));
    tag=scan(term,2,'[]');
    if term ne ' ' then output;
  end;
run;

proc sort data=middle ;
  by code tag ;
run;

data middle2 ;
  set middle ;
  by code tag ;
  rep+1;
  if first.tag then rep=1;
run;

proc transpose data=middle2 out=want(drop=_name_) ;
  by code ;
  id tag rep ;
  var term ;
run;

image.png

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

First split the column into multiple rows. You can parse out the text in the square brackets. Generate a counter so that PROC TRANSPOSE can name the columns.

data have ;
  length code $8 list $100 ;
  infile cards dsd dlm='|' truncover ;
  input code list ;
cards;
a1|abc [EPC],def [MoA]
a2|ghi [EPC],jkl  [MoA]
a3|mno [EPC],pqr [EPC],abo [MoA]
a4|stu [EPC],vw [CI],xyz [MoA]
;

data middle;
  set have ;
  length term $10 tag $3 ;
  do i=1 by 1 until (term=' ');
    term=strip(scan(list,i,','));
    tag=scan(term,2,'[]');
    if term ne ' ' then output;
  end;
run;

proc sort data=middle ;
  by code tag ;
run;

data middle2 ;
  set middle ;
  by code tag ;
  rep+1;
  if first.tag then rep=1;
run;

proc transpose data=middle2 out=want(drop=_name_) ;
  by code ;
  id tag rep ;
  var term ;
run;

image.png

shasank
Quartz | Level 8

Hi Tom,

 

Thank you for your quick response. I really appreciate your solution. I have one small clarification. The data presented is a sample of the actual excel file and it doesnot have any delimiters. There are 13,000 observations in the dataset. Presenting this case I dont think I can use the infile delimiter statement for this case.

 

I am attaching another sample with limited observations.

 

I appreciate your help.

 

Thank you very much.

ballardw
Super User

@shasank wrote:

Hi Tom,

 

Thank you for your quick response. I really appreciate your solution. I have one small clarification. The data presented is a sample of the actual excel file and it doesnot have any delimiters. There are 13,000 observations in the dataset. Presenting this case I dont think I can use the infile delimiter statement for this case.

 

I am attaching another sample with limited observations.

 

I appreciate your help.

 

Thank you very much.


 

The infile in the provided example was because you did not provide example data in the form we could manipulate. So @Tom created a small example set to demostrate the technique. Your existing data would be used by  replacing "have" with the name of your SAS data set.

 

Also XLSX files are very poor for providing example data. Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

Tom
Super User Tom
Super User

The first data step was just to create something for the rest of the code to work on.  You should start from the MIDDLE datastep and provide your actual input dataset name. You will need to modify the code to reflect your actual variable names. Also modify the lengths of the derived TERM and TAG variables to be long enough to handle your actual data.

shasank
Quartz | Level 8

Thank you for the reply. I have been working on it since my last reply.

 

Thank you for the help.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 15145 views
  • 0 likes
  • 3 in conversation