BookmarkSubscribeRSS Feed
GS2
Obsidian | Level 7 GS2
Obsidian | Level 7

SAS 9.4

 

I have a data set with 200+ variables, 64 are score variables at 3 different time points, baseline, 3 months, 12 month, (e.g. score1_baseline - score64_baseline, score1_3mnth - score64_3mnth). I need to convert this from a wide data set to a long set with 1 variable indicating the 3 timepoints. This would lead to each observation having 3 records as a long versus the 1 record as a wide. 

 

Is there a good process for converting 64 variables at one time from wide to long without losing any of the other data? Thank you 

6 REPLIES 6
Reeza
Super User

DELETED.

GS2
Obsidian | Level 7 GS2
Obsidian | Level 7
Thank you for googling and finding a website. I found that webpage as well and am not having success applying it. That is why I reached out to see if there may be another method that I am able to apply
Kurt_Bremser
Super User

I would go the whole way and have a separate observation for each score and timepoint.

Transpose all variables, then, in a follow-up data step, extract the score number and timepoint from _NAME_.

Kurt_Bremser
Super User

A quick example with 2 scores that automatically expands to 64 on its own:

data have;
input
  id $
  score1_baseline score1_3mnth score1_12mnth
  score2_baseline score2_3mnth score2_12mnth
;
datalines;
A 1 2 3 4 5 6
B 2 3 4 5 6 7
;

proc transpose
  data=have
  out=long (rename=(col1=score))
;
by id;
var score:;
run;

data want;
set long;
scorenum = input(substr(scan(_name_,1,"_"),6),2.);
if scan(_name_,2,"_") = "baseline"
then timepoint = 0;
else timepoint = input(compress(scan(_name_,-1,"_"),"mnth"),2.);
drop _name_;
run;
GS2
Obsidian | Level 7 GS2
Obsidian | Level 7

Thank you this is a good start. I attempted to modify the code as my variables were slightly different than described. As such I have a couple questions:
1. Is it possible to scan for a phrase in the variable name instead of scanning the position of the name? I noticed a couple had the "baseline" indicator in the 3rd word position or not had a long name than 32 characters so it was shortened to "baseli"
2. The same applies to the 3 and 12 months where it would be helpful to search for "3m" or "12m"
3. When I ran this I got the score and the timepoint but not the variable name and without the variable name I cannot tell which score the number is associated with. Does this make sense?

 

proc sort data = have;
	by pt_study_deidentified;
run;

proc transpose
  data=have
  out=long (rename=(col1=score));
by pt_study_deidentified;
var back_pain_vas___1Baseline leg_pain_vas1___1Baseline odi_score_pain_intensity___1Base odi_score_personal_care___1Basel
		odi_score_lifting___1Baseline odi_score_walking___1Baseline odi_score_sitting___1Baseline odi_score_standing___1Baseline
		odi_score_sleeping___1Baseline odi_score_sex_life___1Baseline odi_score_social_life___1Baselin odi_score_travelling___1Baseline
		total_odi_sum_score___1Baseline ODI_percent_baseline mobility___1Baseline self_care___1Baseline usual_activities___1Baseline
		pain_discomfort___1Baseline anxiety_depression___1Baseline eq_vas___1Baseline actual_qaly_score___1Baseline promis29_pfa11___1Baseline
		promis29_pfa21___1Baseline promis29_pfa23___1Baseline promis29_pfa53___1Baseline promis29_edanx01___1Baseline promis29_edanx40___1Baseline
		promis29_edanx41___1Baseline promis29_edanx53___1Baseline promis29_eddep04___1Baseline promis29_eddep06___1Baseline
		promis29_eddep29___1Baseline promis29_eddep41___1Baseline promis29_hi7___1Baseline promis29_an3___1Baseline promis29_fatexp41___1Baseline
		promis29_fatexp40___1Baseline promis29_sleep109___1Baseline promis29_sleep116___1Baseline promis29_sleep20___1Baseline
		promis29_sleep44___1Baseline promis29_srpper11_caps___1Baseli promis29_srpper18_caps___1Baseli promis29_srpper23_caps___1Baseli
		promis29_srpper46_caps___1Baseli promis29_painin9___1Baseline promis29_painin22___1Baseline promis29_painin31___1Baseline
		promis29_painin34___1Baseline promis29_global07___1Baseline pf_raw___1Baseline pf_t_score___1Baseline anxiety_raw___1Baseline
		anxiety_t_score___1Baseline depression_raw___1Baseline depression_t_score___1Baseline fatigue_raw___1Baseline fatigue_t_score___1Baseline
		sd_raw___1Baseline sd_t_score___1Baseline social_raw___1Baseline social_t_score___1Baseline pain_raw___1Baseline pain_t_score___1Baseline
	back_pain_vas___3month leg_pain_vas1___3month odi_score_pain_intensity___3mont odi_score_personal_care___3month odi_score_lifting___3month
		odi_score_walking___3month odi_score_sitting___3month odi_score_standing___3month odi_score_sleeping___3month odi_score_sex_life___3month
		odi_score_social_life___3month odi_score_travelling___3month total_odi_sum_score___3month ODI_percent_3months mobility___3month
		self_care___3month usual_activities___3month pain_discomfort___3month anxiety_depression___3month eq_vas___3month actual_qaly_score___3month
		promis29_pfa11___3month promis29_pfa21___3month promis29_pfa23___3month promis29_pfa53___3month promis29_edanx01___3month
		promis29_edanx40___3month promis29_edanx41___3month promis29_edanx53___3month promis29_eddep04___3month promis29_eddep06___3month
		promis29_eddep29___3month promis29_eddep41___3month promis29_hi7___3month promis29_an3___3month promis29_fatexp41___3month
		promis29_fatexp40___3month promis29_sleep109___3month promis29_sleep116___3month promis29_sleep20___3month promis29_sleep44___3month
		promis29_srpper11_caps___3month promis29_srpper18_caps___3month promis29_srpper23_caps___3month promis29_srpper46_caps___3month
		promis29_painin9___3month promis29_painin22___3month promis29_painin31___3month promis29_painin34___3month promis29_global07___3month
		pf_raw___3month pf_t_score___3month anxiety_raw___3month anxiety_t_score___3month depression_raw___3month depression_t_score___3month
		fatigue_raw___3month fatigue_t_score___3month sd_raw___3month sd_t_score___3month social_raw___3month social_t_score___3month
		pain_raw___3month pain_t_score___3month
	back_pain_vas__12month leg_pain_vas1__12month odi_score_pain_intensity__12mont odi_score_personal_care__12month odi_score_lifting__12month
		odi_score_walking__12month odi_score_sitting__12month odi_score_standing__12month odi_score_sleeping__12month odi_score_sex_life__12month
		odi_score_social_life__12month odi_score_travelling__12month total_odi_sum_score__12month ODI_percent_12months mobility__12month
		self_care__12month usual_activities__12month pain_discomfort__12month anxiety_depression__12month eq_vas__12month
		actual_qaly_score__12month promis29_pfa11__12month promis29_pfa21__12month promis29_pfa23__12month promis29_pfa53__12month
		promis29_edanx01__12month promis29_edanx40__12month promis29_edanx41__12month promis29_edanx53__12month promis29_eddep04__12month
		promis29_eddep06__12month promis29_eddep29__12month promis29_eddep41__12month promis29_hi7__12month promis29_an3__12month
		promis29_fatexp41__12month promis29_fatexp40__12month promis29_sleep109__12month promis29_sleep116__12month promis29_sleep20__12month
		promis29_sleep44__12month promis29_srpper11_caps__12month promis29_srpper18_caps__12month promis29_srpper23_caps__12month
		promis29_srpper46_caps__12month promis29_painin9__12month promis29_painin22__12month promis29_painin31__12month promis29_painin34__12month
		promis29_global07__12month pf_raw__12month pf_t_score__12month anxiety_raw__12month anxiety_t_score__12month depression_raw__12month
		depression_t_score__12month fatigue_raw__12month fatigue_t_score__12month sd_raw__12month sd_t_score__12month social_raw__12month
		social_t_score__12month pain_raw__12month pain_t_score__12month;
run;

data want;
set long;
scorenum = input(substr(scan(_name_,1,"_"),6),2.);
if scan(_name_,2,"_") = "baseline"
then timepoint = 0;
else timepoint = input(compress(scan(_name_,-1,"_"),"month"),2.);
drop _name_;
run;

Ideally, I the final product would look like below but I am not sure if this is possible given the data. Thank you 

 

data want;
input
pt_study_deidentified timepoint $ back_pain_vas leg_pain_vas1;
datalines;
1 baseline 2 3
1 3month 3 4
1 12month 5 5
2 baseline 2 3
2 3month 3 4
2 12month 5 5
;

 

Kurt_Bremser
Super User

First of all, you do not have score numbers throughout (e.g. pain_t_score).

But you can extract everything before a sequence of two underlines, and get the time point from the rest:

data long;
input _name_ $32.;
datalines;
back_pain_vas___1Baseline
leg_pain_vas1___1Baseline
back_pain_vas___3month
leg_pain_vas1___3month
back_pain_vas__12month
leg_pain_vas1__12month
;

data want;
set long;
i = index(_name_,"__");
study = substr(_name_,1,i-1);
select (upcase(substr(_name_,i+2,3)));
  when ("_1B") timepoint = 1;
  when ("_3M") timepoint = 3;
  when ("12M") timepoint = 12;
  otherwise; /* insert alarm code here if needed */
end;
drop i;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1242 views
  • 2 likes
  • 3 in conversation