Solved: Array to scan character columns, concatenate values, replace old value...

mtr91 · Posted 06-07-2022 08:19 PM

I really cannot figure this one out...

I have 15 columns (drug1-drug15) and I need to combine select values from one column with a value from another column (this will be different combinations for every row, and only for select rows, and some rows will need to combine multiple values).

Here is what I have compared to what I need. For example, for Observation 1 I need to combine 'carboxy-thc' from drug1 and 'thc' from drug2 and enter the new value 'thc/carboxy-thc' to drug1 and leave drug2 blank. I think I need to use an array, but I do not have experience using them.

HAVE
Obs	drug1	drug2	drug3	drug4	…
1	carboxy-thc	thc
2	benzoylecgonine	cocaethylene	cocaine
3	carboxy-thc	diazepam	nordiazepam	thc
4	amphetamines	methamphetamine	midazolam
5	fluoxetine	norfluoxetine	thc

NEED
Obs	drug1	drug2	drug3	drug4	…
1	thc/carboxy-thc
2	cocaine/benzoylecgonine/cocaethlene
3	thc/carboxy-thc	diazepam/nordiazepam
4	methamphetamines/amphetamines	midazolam
5	fluoxetine/norfluoxetine	thc

Any help is much appreciated! Thank you!

Tom · Posted 06-08-2022 12:32 AM

That might be the result of transposing.

But the drug dictionary information you need would look more like:

data drugs ;
  input group drug :$30. ;
cards;
1 thc
1 carboxy-thc            
2 cocaine
2 benzoylecgonine
2 cocaethlene            
3 diazepam
3 nordiazepam         
4 methamphetamines
4 amphetamines
5 fluoxetine
5 norfluoxetine
;

So if HAVE contains an ID variable to unique identify the observations and DRUG1 to DRUG15 then you could use PROC TRANSPOSE to first convert the current HAVE dataset from 15 variable to 15 observations.

proc transpose data=have out=tall (drop=_name_ rename=(col1=drug));
  by id;
  var drug1-drug15;
run;

Now combine that with the drug dictionary and re-order.

proc sql ;
create table tall_groups as
  select distinct a.id, b.group, a.drug 
   from tall a
   inner join drugs b
   on a.drug=b.drug
   order by a.id, b.group, a.drug
;
quit;

And finally you can collapse the multiple drugs for a group into one observation.

data tall_concat ;
   do until(last.group);
      set tall_group
      by id group;
      length drug_group $100 ;
      drug_group=catx('/',drug_group,drug);
   end;
run;

View solution in original post

Tom · Posted 06-07-2022 09:15 PM

You will need additional information you have not provided to actually be able to code this. In particular how did you know that "thc" and "carboxy-thc" should be combined? Why not combine carboxy-thc with diazepam?

Anyway the first thing you probably need to do is transpose the 15 drug variables into one drug variable with up to 15 observations. Then once you have mapped the drug names to something (drug class? drug generic name?) that will allow you to combine them you can group them and collapse the similar names into one observations. You could then convert that back into the wide format if you need it for a report.

mtr91 · Posted 06-07-2022 09:40 PM

Thanks for the feedback.

The drug pairing are based on parent-metabolite associations. So 'carboxy-thc' is the metabolite of 'thc'. So I want to combine the parent-metabolite drugs into a single drug.

Is this what you are describing?

Obs	drug
1	carboxy-thc
1	thc
2	benzoylecgonine
2	cocaethylene
2	cocaine
3	carboxy-thc
3	diazepam
3	nordiazepam
3	thc
4	amphetamines
4	methamphetamine
4	midazolam

Tom · Posted 06-08-2022 12:32 AM

That might be the result of transposing.

But the drug dictionary information you need would look more like:

data drugs ;
  input group drug :$30. ;
cards;
1 thc
1 carboxy-thc            
2 cocaine
2 benzoylecgonine
2 cocaethlene            
3 diazepam
3 nordiazepam         
4 methamphetamines
4 amphetamines
5 fluoxetine
5 norfluoxetine
;

So if HAVE contains an ID variable to unique identify the observations and DRUG1 to DRUG15 then you could use PROC TRANSPOSE to first convert the current HAVE dataset from 15 variable to 15 observations.

proc transpose data=have out=tall (drop=_name_ rename=(col1=drug));
  by id;
  var drug1-drug15;
run;

Now combine that with the drug dictionary and re-order.

proc sql ;
create table tall_groups as
  select distinct a.id, b.group, a.drug 
   from tall a
   inner join drugs b
   on a.drug=b.drug
   order by a.id, b.group, a.drug
;
quit;

And finally you can collapse the multiple drugs for a group into one observation.

data tall_concat ;
   do until(last.group);
      set tall_group
      by id group;
      length drug_group $100 ;
      drug_group=catx('/',drug_group,drug);
   end;
run;

mtr91 · Posted 06-08-2022 12:02 PM

Thank you! Big help

Array to scan character columns, concatenate values, replace old values

Re: Array to scan character columns, concatenate values, replace old values

Re: Array to scan character columns, concatenate values, replace old values

Re: Array to scan character columns, concatenate values, replace old values

Re: Array to scan character columns, concatenate values, replace old values

Re: Array to scan character columns, concatenate values, replace old values

Array to scan character columns, concatenate values, replace old values

Re: Array to scan character columns, concatenate values, replace old values

Re: Array to scan character columns, concatenate values, replace old values

Re: Array to scan character columns, concatenate values, replace old values

Re: Array to scan character columns, concatenate values, replace old values

Re: Array to scan character columns, concatenate values, replace old values

SAS Innovate 2025: Call for Content