SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Spliting a String into 3 observations using an Identifier

Accepted Solution Solved
Reply
Contributor
Posts: 48
Accepted Solution

Spliting a String into 3 observations using an Identifier

 

 

Hi SAS Community, I need you help in the following problem.

I have a dataset that looks like this

 

NDCPHARM_CLASSES
21434GLP-1 Receptor Agonist [EPC],Glucagon-Like Peptide 1 [Chemical/Ingredient],Glucagon-like Peptide-1 (GLP-1) Agonists [MoA]
21445.
21975Androgen [EPC],Androgen Receptor Agonists [MoA],Androstanes [Chemical/Ingredient]
21977Androgen [EPC],Androgen Receptor Agonists [MoA],Androstanes [Chemical/Ingredient]
23004.
23227Norepinephrine Reuptake Inhibitor [EPC],Norepinephrine Uptake Inhibitors [MoA]

 

and I am hoping to get an output like this 

 

NDCEPCMoAChemical Ingredient
21434GLP-1 Receptor Agonist Glucagon-like Peptide-1 (GLP-1) Agonists Glucagon-Like Peptide 1 
21445...
21975Androgen Androgen Receptor AgonistsAndrostanes
21977Androgen Androgen Receptor AgonistsAndrostanes
23004...
23227Norepinephrine Reuptake Inhibitor Norepinephrine Uptake Inhibitors.

I have got help for the same problem previously but, I get a WARNING: 2761 observations omitted due to missing values in the ID variables due to PROC TRANSPOSE. I don't want to loose any observation.  Can anyone help me with this problem.

 

Thank you in advance

 
data middle;
  set NDCdata(Keep= PHARM_CLASSES  NDC);
  length term $150 tag $20 ;
  do i=1 by 1 until (term=' ');
    term=strip(scan(PHARM_CLASSES,i,','));
    tag=scan(term,2,'[]');
    if term ne ' ' then output;
  end;
run;

proc sort data=middle ;
  by NDC tag ;
run;

data middle2 ;
  set middle ;
  by NDC tag ;
  rep+1;
  if first.tag then rep=1;
run;

proc transpose data=middle2 out=want(drop=_name_) ;
  by  NDC;
  id tag rep ;
  var term ;
run;

data cc;
set want (Keep = NDC Chemical_Ingredient1 EPC1 MoA1);
run;

proc sort data = b out = bsort Tagsort nodupkey;
by NDC;
run;

proc sort data = cc out = ccsort Tagsort nodupkey;
by NDC;
run;


 data dd ; 
   merge bsort ccsort ; 
   by NDC ; 
   where NDC ^= '.' ; 
   drop productID;
 run

Accepted Solutions
Solution
‎10-12-2017 04:27 PM
Super User
Posts: 6,626

Re: Spliting a String into 3 observations using an Identifier

Sure, that's enough to simplify.  This is untested, so might need minor tweaking:

 

data want;

set have;

length EPC MoA Chemical_Ingredient term $ 150 tag $ 20;

do j=1 by 1 until (term=' ');

   term = scan(PHARM_CLASSES, j, ',');

   if term > ' ' then do;

      tag = scan(term, 2, '][');

      if upcase(tag) = 'MOA' and MoA=' ' then MoA = scan(term, 1, '][');

      else if upcase(tag) = 'EPC' and EPC=' ' then EPC = scan(term, 1, '][');

      else if upcase(tag) = 'CHEMICAL/INGREDIENT' and Chemical_Ingredient = ' ' then Chemical_Ingredient = scan(term, 1, '][');

   end;

end;

drop j term tag PHARM_CLASSES;

run;

 

You might want to keep PHARM_CLASSSES at first, until you are confident that the results are correct.

View solution in original post


All Replies
Super User
Posts: 6,626

Re: Spliting a String into 3 observations using an Identifier

It is possible that the original code  you are working with is overkill and could be simplified considerably.  The right route depends on how well you know what is in your data.  For a single observation, is it possible that there is more than one MoA compound listed?  (That question applies to EPC and Chemical/Ingredient as well.)  If not, extreme simplifications are in order.

Contributor
Posts: 48

Re: Spliting a String into 3 observations using an Identifier

Posted in reply to Astounding
Hi Astounding,
Yes, There are more than 1 occurrences, but I need only the 1st occurrence i:e I Need only the 1st MoA even if 2 MoA or greater are present.
Will this simplify the problem??
Solution
‎10-12-2017 04:27 PM
Super User
Posts: 6,626

Re: Spliting a String into 3 observations using an Identifier

Sure, that's enough to simplify.  This is untested, so might need minor tweaking:

 

data want;

set have;

length EPC MoA Chemical_Ingredient term $ 150 tag $ 20;

do j=1 by 1 until (term=' ');

   term = scan(PHARM_CLASSES, j, ',');

   if term > ' ' then do;

      tag = scan(term, 2, '][');

      if upcase(tag) = 'MOA' and MoA=' ' then MoA = scan(term, 1, '][');

      else if upcase(tag) = 'EPC' and EPC=' ' then EPC = scan(term, 1, '][');

      else if upcase(tag) = 'CHEMICAL/INGREDIENT' and Chemical_Ingredient = ' ' then Chemical_Ingredient = scan(term, 1, '][');

   end;

end;

drop j term tag PHARM_CLASSES;

run;

 

You might want to keep PHARM_CLASSSES at first, until you are confident that the results are correct.

Contributor
Posts: 48

Re: Spliting a String into 3 observations using an Identifier

Posted in reply to Astounding
Worked like a charm.

Thanks a lot.
Trusted Advisor
Posts: 1,309

Re: Spliting a String into 3 observations using an Identifier

This is a late response, but is a little easier to generalize to several/dozens of variables.

 

data have;
  input NDC PHARM_CLASSES :$&200. ;
datalines;
21434 GLP-1 Receptor Agonist [EPC],Glucagon-Like Peptide 1 [Chemical/Ingredient],Glucagon-like Peptide-1 (GLP-1) Agonists [MoA] 
21445 . 
21975 Androgen [EPC],Androgen Receptor Agonists [MoA],Androstanes [Chemical/Ingredient] 
21977 Androgen [EPC],Androgen Receptor Agonists [MoA],Androstanes [Chemical/Ingredient] 
23004 . 
23227 Norepinephrine Reuptake Inhibitor [EPC],Norepinephrine Uptake Inhibitors [MoA] 
run;
data want (drop=_:);
  set have;
  array vars{3} $60 epc chemical_ingredient moa;

  length _tag_list $40 ;
  retain _tag_list '[EPC] [Chemical/Ingredient] [MoA]';
 
  do _P=1  by 1 while(scan(pharm_classes,_P,',')^='');       
     _value_tag_pair=scan(pharm_classes,_P,',');             
	 _v=findw(_tag_list,trim(scan(_value_tag_pair,-1,' ')),' ','e');    
	 if _v^=0 then vars{_v}=scan(_value_tag_pair,1,'[');               
  end;
run;

 

This works by searching the tag-in-hand against the tag_list, yielding the position of the current tag in the list.  This position (_V) is used to assign a value to the appropriate element in character-array VARS.

Contributor
Posts: 48

Re: Spliting a String into 3 observations using an Identifier

Thanks alot. This helps.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 285 views
  • 2 likes
  • 3 in conversation