remove text string from column names

Fred_Gavin · Posted 01-10-2012 07:17 PM

Dear All,

I have two survey datasets collected several months apart with the exact same questions. Therefore the column names of two datasets are partially the same, except in the first one it looks like baseline_question1, baseline_question2.....

and in the second it is six_month_question1, six_month_question2,........

There are nearly a hundred questions. I would like to transform them into:

Question1, Question2, Question3..... Time

Baseline

.......

Six_month

........

Therefore I need to remove the prefix of column names from both datasets, and combine them later.

Can anyone please suggest a way to do this?

Thank you in advance.

Tom · Posted 01-10-2012 07:44 PM

Because the names end in numeric suffixes should be able to rename very easily.

data want ;
   length time $20 ;
   set baseline (in=in1 rename=(baseline_question1-baseline_question20=question1-question20))
       month6 (in=in2 rename=(six_month_question1-six_month_question20=question1-question20))
   ;
   if in1 then time='Baseline';
   if in2 then time='Six Months';
run;

Fred_Gavin · Posted 01-10-2012 08:08 PM

Thanks for the suggestion.

They do actually not end in numeric suffixes. I just put "question1", "question2", as an example for the simplicity. Sorry for the misleading.

In the dataset, for instance, the names are " baseline_overallsafetyofChart", "baseline_indication", and so on.

So the only common part is the the prefix "baseline" for the first dataset, and "six_month" for the second dataset.

Thanks and regards

art297 · Posted 01-10-2012 08:18 PM

Then, I think your easiest solution would be to use the tranwrd word function, using proc sql, to create a macro variable based on dictionary.columns, the does the recode in your datastep.

Linlin · Posted 01-10-2012 08:51 PM

data base;

input baseline_overallsafetyofChart baseline_indication $;

cards;

1 a

11 aa

;

run;

data six;

input six_overallsafetyofChart six_indication $;

cards;

2 b

22 bb

;

run;

proc transpose data=base(obs=1) out=temp1;

var _all_;

run;

proc transpose data=six(obs=1) out=temp2;

var _all_;

run;

data _null_;

set temp1 end=last ;

call symputx(cats('base',strip(_n_)),_name_);

if last then call symputx('nbase',_n_);

run;

data _null_;

set temp2 end=last ;

call symputx(cats('six',strip(_n_)),_name_);

if last then call symputx('nsix',_n_);

run;

%put _user_;

%macro rename;

data new_base;

set base;

rename %do i=1 %to &nbase;

&&base&i=%scan(&&base&i,-1,'_')

%end;;

run;

data new_six;

set six;

rename %do i=1 %to &nsix;

&&six&i=%scan(&&six&i,-1,'_')

%end;;

run;

%mend;

%rename;

data both;

length time $ 15;

set new_base (in=a)

new_six(in=b);

if a then time='Baseline';

if b then time='Six Months';

run;

proc print;run;

overallsafetyof

Obs time Chart indication

1 Baseline 1 a

2 Baseline 11 aa

3 Six Months 2 b

4 Six Months 22 bb

Linlin

Ksharp · Posted 01-11-2012 12:38 AM

How about:

data base;
input baseline_overallsafetyofChart baseline_indication $;
cards;
1 a
11 aa
;
run;

data six;
input six_month_overallsafetyofChart six_month_indication $;
cards;
2 b
22 bb
;
run;

proc sql noprint ;
 select catx('=',name,scan(name,-1,'_')) into : base separated by ' '
  from dictionary.columns 
   where libname='WORK' and memname='BASE' and name like 'baseline~_%' escape '~';

 select catx('=',name,scan(name,-1,'_')) into : six separated by ' '
  from dictionary.columns 
   where libname='WORK' and memname='SIX' and name like 'six~_month~_%' escape '~';

quit;
%put &base &six;

proc datasets library=work nolist;
 modify base;
  rename &base ;
 modify six;
  rename &six;
quit;

Ksharp

Fred_Gavin · Posted 01-11-2012 12:41 AM

Thanks for all the replies.

I will test both suggested way soon.

Regards