BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shasank
Quartz | Level 8

 

 

Hi SAS Community, I need you help in the following problem.

I have a dataset that looks like this

 

NDCPHARM_CLASSES
21434GLP-1 Receptor Agonist [EPC],Glucagon-Like Peptide 1 [Chemical/Ingredient],Glucagon-like Peptide-1 (GLP-1) Agonists [MoA]
21445.
21975Androgen [EPC],Androgen Receptor Agonists [MoA],Androstanes [Chemical/Ingredient]
21977Androgen [EPC],Androgen Receptor Agonists [MoA],Androstanes [Chemical/Ingredient]
23004.
23227Norepinephrine Reuptake Inhibitor [EPC],Norepinephrine Uptake Inhibitors [MoA]

 

and I am hoping to get an output like this 

 

NDCEPCMoAChemical Ingredient
21434GLP-1 Receptor Agonist Glucagon-like Peptide-1 (GLP-1) Agonists Glucagon-Like Peptide 1 
21445...
21975Androgen Androgen Receptor AgonistsAndrostanes
21977Androgen Androgen Receptor AgonistsAndrostanes
23004...
23227Norepinephrine Reuptake Inhibitor Norepinephrine Uptake Inhibitors.

I have got help for the same problem previously but, I get a WARNING: 2761 observations omitted due to missing values in the ID variables due to PROC TRANSPOSE. I don't want to loose any observation.  Can anyone help me with this problem.

 

Thank you in advance

 
data middle;
  set NDCdata(Keep= PHARM_CLASSES  NDC);
  length term $150 tag $20 ;
  do i=1 by 1 until (term=' ');
    term=strip(scan(PHARM_CLASSES,i,','));
    tag=scan(term,2,'[]');
    if term ne ' ' then output;
  end;
run;

proc sort data=middle ;
  by NDC tag ;
run;

data middle2 ;
  set middle ;
  by NDC tag ;
  rep+1;
  if first.tag then rep=1;
run;

proc transpose data=middle2 out=want(drop=_name_) ;
  by  NDC;
  id tag rep ;
  var term ;
run;

data cc;
set want (Keep = NDC Chemical_Ingredient1 EPC1 MoA1);
run;

proc sort data = b out = bsort Tagsort nodupkey;
by NDC;
run;

proc sort data = cc out = ccsort Tagsort nodupkey;
by NDC;
run;


 data dd ; 
   merge bsort ccsort ; 
   by NDC ; 
   where NDC ^= '.' ; 
   drop productID;
 run
1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Sure, that's enough to simplify.  This is untested, so might need minor tweaking:

 

data want;

set have;

length EPC MoA Chemical_Ingredient term $ 150 tag $ 20;

do j=1 by 1 until (term=' ');

   term = scan(PHARM_CLASSES, j, ',');

   if term > ' ' then do;

      tag = scan(term, 2, '][');

      if upcase(tag) = 'MOA' and MoA=' ' then MoA = scan(term, 1, '][');

      else if upcase(tag) = 'EPC' and EPC=' ' then EPC = scan(term, 1, '][');

      else if upcase(tag) = 'CHEMICAL/INGREDIENT' and Chemical_Ingredient = ' ' then Chemical_Ingredient = scan(term, 1, '][');

   end;

end;

drop j term tag PHARM_CLASSES;

run;

 

You might want to keep PHARM_CLASSSES at first, until you are confident that the results are correct.

View solution in original post

6 REPLIES 6
Astounding
PROC Star

It is possible that the original code  you are working with is overkill and could be simplified considerably.  The right route depends on how well you know what is in your data.  For a single observation, is it possible that there is more than one MoA compound listed?  (That question applies to EPC and Chemical/Ingredient as well.)  If not, extreme simplifications are in order.

shasank
Quartz | Level 8
Hi Astounding,
Yes, There are more than 1 occurrences, but I need only the 1st occurrence i:e I Need only the 1st MoA even if 2 MoA or greater are present.
Will this simplify the problem??
Astounding
PROC Star

Sure, that's enough to simplify.  This is untested, so might need minor tweaking:

 

data want;

set have;

length EPC MoA Chemical_Ingredient term $ 150 tag $ 20;

do j=1 by 1 until (term=' ');

   term = scan(PHARM_CLASSES, j, ',');

   if term > ' ' then do;

      tag = scan(term, 2, '][');

      if upcase(tag) = 'MOA' and MoA=' ' then MoA = scan(term, 1, '][');

      else if upcase(tag) = 'EPC' and EPC=' ' then EPC = scan(term, 1, '][');

      else if upcase(tag) = 'CHEMICAL/INGREDIENT' and Chemical_Ingredient = ' ' then Chemical_Ingredient = scan(term, 1, '][');

   end;

end;

drop j term tag PHARM_CLASSES;

run;

 

You might want to keep PHARM_CLASSSES at first, until you are confident that the results are correct.

shasank
Quartz | Level 8
Worked like a charm.

Thanks a lot.
mkeintz
PROC Star

This is a late response, but is a little easier to generalize to several/dozens of variables.

 

data have;
  input NDC PHARM_CLASSES :$&200. ;
datalines;
21434 GLP-1 Receptor Agonist [EPC],Glucagon-Like Peptide 1 [Chemical/Ingredient],Glucagon-like Peptide-1 (GLP-1) Agonists [MoA] 
21445 . 
21975 Androgen [EPC],Androgen Receptor Agonists [MoA],Androstanes [Chemical/Ingredient] 
21977 Androgen [EPC],Androgen Receptor Agonists [MoA],Androstanes [Chemical/Ingredient] 
23004 . 
23227 Norepinephrine Reuptake Inhibitor [EPC],Norepinephrine Uptake Inhibitors [MoA] 
run;
data want (drop=_:);
  set have;
  array vars{3} $60 epc chemical_ingredient moa;

  length _tag_list $40 ;
  retain _tag_list '[EPC] [Chemical/Ingredient] [MoA]';
 
  do _P=1  by 1 while(scan(pharm_classes,_P,',')^='');       
     _value_tag_pair=scan(pharm_classes,_P,',');             
	 _v=findw(_tag_list,trim(scan(_value_tag_pair,-1,' ')),' ','e');    
	 if _v^=0 then vars{_v}=scan(_value_tag_pair,1,'[');               
  end;
run;

 

This works by searching the tag-in-hand against the tag_list, yielding the position of the current tag in the list.  This position (_V) is used to assign a value to the appropriate element in character-array VARS.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
shasank
Quartz | Level 8
Thanks alot. This helps.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1350 views
  • 2 likes
  • 3 in conversation