BookmarkSubscribeRSS Feed
GS2
Obsidian | Level 7 GS2
Obsidian | Level 7

SAS 9.4

 

I have a data set with 200+ variables, 64 are score variables at 3 different time points, baseline, 3 months, 12 month, (e.g. score1_baseline - score64_baseline, score1_3mnth - score64_3mnth). I need to convert this from a wide data set to a long set with 1 variable indicating the 3 timepoints. This would lead to each observation having 3 records as a long versus the 1 record as a wide. 

 

Is there a good process for converting 64 variables at one time from wide to long without losing any of the other data? Thank you 

6 REPLIES 6
Reeza
Super User

DELETED.

GS2
Obsidian | Level 7 GS2
Obsidian | Level 7
Thank you for googling and finding a website. I found that webpage as well and am not having success applying it. That is why I reached out to see if there may be another method that I am able to apply
Kurt_Bremser
Super User

I would go the whole way and have a separate observation for each score and timepoint.

Transpose all variables, then, in a follow-up data step, extract the score number and timepoint from _NAME_.

Kurt_Bremser
Super User

A quick example with 2 scores that automatically expands to 64 on its own:

data have;
input
  id $
  score1_baseline score1_3mnth score1_12mnth
  score2_baseline score2_3mnth score2_12mnth
;
datalines;
A 1 2 3 4 5 6
B 2 3 4 5 6 7
;

proc transpose
  data=have
  out=long (rename=(col1=score))
;
by id;
var score:;
run;

data want;
set long;
scorenum = input(substr(scan(_name_,1,"_"),6),2.);
if scan(_name_,2,"_") = "baseline"
then timepoint = 0;
else timepoint = input(compress(scan(_name_,-1,"_"),"mnth"),2.);
drop _name_;
run;
GS2
Obsidian | Level 7 GS2
Obsidian | Level 7

Thank you this is a good start. I attempted to modify the code as my variables were slightly different than described. As such I have a couple questions:
1. Is it possible to scan for a phrase in the variable name instead of scanning the position of the name? I noticed a couple had the "baseline" indicator in the 3rd word position or not had a long name than 32 characters so it was shortened to "baseli"
2. The same applies to the 3 and 12 months where it would be helpful to search for "3m" or "12m"
3. When I ran this I got the score and the timepoint but not the variable name and without the variable name I cannot tell which score the number is associated with. Does this make sense?

 

proc sort data = have;
	by pt_study_deidentified;
run;

proc transpose
  data=have
  out=long (rename=(col1=score));
by pt_study_deidentified;
var back_pain_vas___1Baseline leg_pain_vas1___1Baseline odi_score_pain_intensity___1Base odi_score_personal_care___1Basel
		odi_score_lifting___1Baseline odi_score_walking___1Baseline odi_score_sitting___1Baseline odi_score_standing___1Baseline
		odi_score_sleeping___1Baseline odi_score_sex_life___1Baseline odi_score_social_life___1Baselin odi_score_travelling___1Baseline
		total_odi_sum_score___1Baseline ODI_percent_baseline mobility___1Baseline self_care___1Baseline usual_activities___1Baseline
		pain_discomfort___1Baseline anxiety_depression___1Baseline eq_vas___1Baseline actual_qaly_score___1Baseline promis29_pfa11___1Baseline
		promis29_pfa21___1Baseline promis29_pfa23___1Baseline promis29_pfa53___1Baseline promis29_edanx01___1Baseline promis29_edanx40___1Baseline
		promis29_edanx41___1Baseline promis29_edanx53___1Baseline promis29_eddep04___1Baseline promis29_eddep06___1Baseline
		promis29_eddep29___1Baseline promis29_eddep41___1Baseline promis29_hi7___1Baseline promis29_an3___1Baseline promis29_fatexp41___1Baseline
		promis29_fatexp40___1Baseline promis29_sleep109___1Baseline promis29_sleep116___1Baseline promis29_sleep20___1Baseline
		promis29_sleep44___1Baseline promis29_srpper11_caps___1Baseli promis29_srpper18_caps___1Baseli promis29_srpper23_caps___1Baseli
		promis29_srpper46_caps___1Baseli promis29_painin9___1Baseline promis29_painin22___1Baseline promis29_painin31___1Baseline
		promis29_painin34___1Baseline promis29_global07___1Baseline pf_raw___1Baseline pf_t_score___1Baseline anxiety_raw___1Baseline
		anxiety_t_score___1Baseline depression_raw___1Baseline depression_t_score___1Baseline fatigue_raw___1Baseline fatigue_t_score___1Baseline
		sd_raw___1Baseline sd_t_score___1Baseline social_raw___1Baseline social_t_score___1Baseline pain_raw___1Baseline pain_t_score___1Baseline
	back_pain_vas___3month leg_pain_vas1___3month odi_score_pain_intensity___3mont odi_score_personal_care___3month odi_score_lifting___3month
		odi_score_walking___3month odi_score_sitting___3month odi_score_standing___3month odi_score_sleeping___3month odi_score_sex_life___3month
		odi_score_social_life___3month odi_score_travelling___3month total_odi_sum_score___3month ODI_percent_3months mobility___3month
		self_care___3month usual_activities___3month pain_discomfort___3month anxiety_depression___3month eq_vas___3month actual_qaly_score___3month
		promis29_pfa11___3month promis29_pfa21___3month promis29_pfa23___3month promis29_pfa53___3month promis29_edanx01___3month
		promis29_edanx40___3month promis29_edanx41___3month promis29_edanx53___3month promis29_eddep04___3month promis29_eddep06___3month
		promis29_eddep29___3month promis29_eddep41___3month promis29_hi7___3month promis29_an3___3month promis29_fatexp41___3month
		promis29_fatexp40___3month promis29_sleep109___3month promis29_sleep116___3month promis29_sleep20___3month promis29_sleep44___3month
		promis29_srpper11_caps___3month promis29_srpper18_caps___3month promis29_srpper23_caps___3month promis29_srpper46_caps___3month
		promis29_painin9___3month promis29_painin22___3month promis29_painin31___3month promis29_painin34___3month promis29_global07___3month
		pf_raw___3month pf_t_score___3month anxiety_raw___3month anxiety_t_score___3month depression_raw___3month depression_t_score___3month
		fatigue_raw___3month fatigue_t_score___3month sd_raw___3month sd_t_score___3month social_raw___3month social_t_score___3month
		pain_raw___3month pain_t_score___3month
	back_pain_vas__12month leg_pain_vas1__12month odi_score_pain_intensity__12mont odi_score_personal_care__12month odi_score_lifting__12month
		odi_score_walking__12month odi_score_sitting__12month odi_score_standing__12month odi_score_sleeping__12month odi_score_sex_life__12month
		odi_score_social_life__12month odi_score_travelling__12month total_odi_sum_score__12month ODI_percent_12months mobility__12month
		self_care__12month usual_activities__12month pain_discomfort__12month anxiety_depression__12month eq_vas__12month
		actual_qaly_score__12month promis29_pfa11__12month promis29_pfa21__12month promis29_pfa23__12month promis29_pfa53__12month
		promis29_edanx01__12month promis29_edanx40__12month promis29_edanx41__12month promis29_edanx53__12month promis29_eddep04__12month
		promis29_eddep06__12month promis29_eddep29__12month promis29_eddep41__12month promis29_hi7__12month promis29_an3__12month
		promis29_fatexp41__12month promis29_fatexp40__12month promis29_sleep109__12month promis29_sleep116__12month promis29_sleep20__12month
		promis29_sleep44__12month promis29_srpper11_caps__12month promis29_srpper18_caps__12month promis29_srpper23_caps__12month
		promis29_srpper46_caps__12month promis29_painin9__12month promis29_painin22__12month promis29_painin31__12month promis29_painin34__12month
		promis29_global07__12month pf_raw__12month pf_t_score__12month anxiety_raw__12month anxiety_t_score__12month depression_raw__12month
		depression_t_score__12month fatigue_raw__12month fatigue_t_score__12month sd_raw__12month sd_t_score__12month social_raw__12month
		social_t_score__12month pain_raw__12month pain_t_score__12month;
run;

data want;
set long;
scorenum = input(substr(scan(_name_,1,"_"),6),2.);
if scan(_name_,2,"_") = "baseline"
then timepoint = 0;
else timepoint = input(compress(scan(_name_,-1,"_"),"month"),2.);
drop _name_;
run;

Ideally, I the final product would look like below but I am not sure if this is possible given the data. Thank you 

 

data want;
input
pt_study_deidentified timepoint $ back_pain_vas leg_pain_vas1;
datalines;
1 baseline 2 3
1 3month 3 4
1 12month 5 5
2 baseline 2 3
2 3month 3 4
2 12month 5 5
;

 

Kurt_Bremser
Super User

First of all, you do not have score numbers throughout (e.g. pain_t_score).

But you can extract everything before a sequence of two underlines, and get the time point from the rest:

data long;
input _name_ $32.;
datalines;
back_pain_vas___1Baseline
leg_pain_vas1___1Baseline
back_pain_vas___3month
leg_pain_vas1___3month
back_pain_vas__12month
leg_pain_vas1__12month
;

data want;
set long;
i = index(_name_,"__");
study = substr(_name_,1,i-1);
select (upcase(substr(_name_,i+2,3)));
  when ("_1B") timepoint = 1;
  when ("_3M") timepoint = 3;
  when ("12M") timepoint = 12;
  otherwise; /* insert alarm code here if needed */
end;
drop i;
run;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2002 views
  • 2 likes
  • 3 in conversation