DATA Step, Macro, Functions and more

convert wide format data to long format with loop structure

Accepted Solution Solved
Reply
Super Contributor
Posts: 312
Accepted Solution

convert wide format data to long format with loop structure

[ Edited ]

I have a data set in wide format, and want to convert to long format. Each subject has (2+32*36) variables as follow:

subjid, visit_num, 32 set of the following 36 variables

DTAMT1, DTAMO1
DTA1_BOP_DF DTA1_BOP_DL DTA1_BOP_F DTA1_BOP_L DTA1_BOP_MF DTA1_BOP_ML
DTA1_CAL_DF DTA1_CAL_DL DTA1_CAL_F DTA1_CAL_L DTA1_CAL_MF DTA1_CAL_ML
DTA1_PDD_DF DTA1_PDD_DL DTA1_PDD_F DTA1_PDD_L DTA1_PDD_MF DTA1_PDD_ML
DTA1_REC_DF DTA1_REC_DL DTA1_REC_F DTA1_REC_L DTA1_REC_MF DTA1_REC_ML
DTA1_PI_DF DTA1_PI_DL DTA1_PI_F DTA1_PI_L DTA1_PI_MF DTA1_PI_ML
DTA1_GI_DF DTA1_GI_F DTA1_GI_L DTA1_GI_MF

.....

DTAMT32, DTAMO32
DTA32_BOP_DF DTA32_BOP_DL DTA32_BOP_F DTA32_BOP_L DTA32_BOP_MF DTA32_BOP_ML
DTA32_CAL_DF DTA32_CAL_DL DTA32_CAL_F DTA32_CAL_L DTA32_CAL_MF DTA32_CAL_ML
DTA32_PDD_DF DTA32_PDD_DL DTA32_PDD_F DTA32_PDD_L DTA32_PDD_MF DTA32_PDD_ML
DTA32_REC_DF DTA32_REC_DL DTA32_REC_F DTA32_REC_L DTA32_REC_MF DTA32_REC_ML
DTA32_PI_DF DTA32_PI_DL DTA32_PI_F DTA32_PI_L DTA32_PI_MF DTA32_PI_ML
DTA32_GI_DF DTA32_GI_F DTA32_GI_L DTA32_GI_MF

 

The number after DTA is tooth_number (1,2, ..... 32), the letters in the end (DF, DL, F, L, MF and ML) are tooth_sites (total 6), and the letters in the middle are measure variables (total 6). Each subject has 32 teeth and each tooth has 6 sites, and eash site has 6 measures. 

 

I want to convert the wide format data to long format data, as shown in the following picture.

Capture.PNG

 

With the following code, I got something close, but with some issues:

 

data test;
	set mydata;
	array dta dta:;

	do over dta;
		var_name=vname(dta);
		if index(var_name, '_') > 0 then do;
			_tooth=scan(var_name, 1, '_');
			tooth_num=substr(_tooth, 4);
			_var=scan(var_name, 2, '_');
			tooth_site=scan(var_name, 3, '_');

			if _var='BOP' then BOP=dta;output; 
			if _var='CAL' then CAL=dta;output;
			if _var='REC' then REC=dta;output;
			if _var='PDD' then PDD=dta;output;
			if _var='PI' then PI=dta;output;
			if _var='GI' then GI=dta;output;
		end;
		if index(var_name, 'DTAMT') > 0 then DTAMT=dta;
		if index(var_name, 'DTAMO') > 0 then DTAMO=dta;
	end;
	keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

probably the problem is due to the output after each if statement. Can anyone give some suggestions on my code or other methods? Thank you in advance.


Accepted Solutions
Solution
‎02-23-2017 09:20 PM
PROC Star
Posts: 7,364

Re: convert wide format data to long format with loop structure

I, personally, don't have any problem with the approach you've taken. I think the following is all you need to complete the task:

 

data test;
	set mydata;
	array dta(*) dta:;

	do i=1 to dim(dta);
      var_name=vname(dta(i));
      if index(var_name, 'DTAMT') > 0 then DTAMT=dta(i);
      else if index(var_name, 'DTAMO') > 0 then DTAMO=dta(i);
      else do;
		if index(var_name, '_') > 0 then do;
          tooth_num=substr(scan(var_name, 1, '_'), 4);
          _var=scan(var_name, 2, '_');
          tooth_site=scan(var_name, 3, '_');
          if _var='BOP' then BOP=dta(i); 
          else if _var='CAL' then CAL=dta(i);
          else if _var='REC' then REC=dta(i);
          else if _var='PDD' then PDD=dta(i);
          else if _var='PI' then PI=dta(i);
          else if _var='GI' then GI=dta(i);
          output;
        end;
      end;
    end;
    keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

proc sort data=test;
  by Subjid visit_num tooth_num tooth_site;
run;

data want ;
  update test (obs=0 keep=Subjid visit_num tooth_num tooth_site) 
         test;
  by Subjid visit_num tooth_num tooth_site;
run;

HTH,

Art, CEO, AnalystFinder.com

View solution in original post


All Replies
PROC Star
Posts: 290

Re: convert wide format data to long format with loop structure

Why not just transpose all the "DTA" variables by ID, etc. and then extract the information you need from the _NAME_ variable?

Super User
Posts: 17,912

Re: convert wide format data to long format with loop structure

I may be missing something but wouldn't PROC TRANSPOSE do 99% of what you want here. You'll probably have to do a dataset to clean up some of the variables anyways, but it should get you really close to what you want.

 

 

PROC Star
Posts: 290

Re: convert wide format data to long format with loop structure

[ Edited ]

I like this comment because it agrees with mine!

 

And if the naming convention of the variables is as shown, you could get away with a simple "var dta:;" in the transpose.

Super Contributor
Posts: 312

Re: convert wide format data to long format with loop structure

Yes, I also tried proc transpose by id and visitnum, and everything is in the col1 column; from there I can extract different variables. But I also want to figure out why the code using array did not work. Thank you for your comments.
Solution
‎02-23-2017 09:20 PM
PROC Star
Posts: 7,364

Re: convert wide format data to long format with loop structure

I, personally, don't have any problem with the approach you've taken. I think the following is all you need to complete the task:

 

data test;
	set mydata;
	array dta(*) dta:;

	do i=1 to dim(dta);
      var_name=vname(dta(i));
      if index(var_name, 'DTAMT') > 0 then DTAMT=dta(i);
      else if index(var_name, 'DTAMO') > 0 then DTAMO=dta(i);
      else do;
		if index(var_name, '_') > 0 then do;
          tooth_num=substr(scan(var_name, 1, '_'), 4);
          _var=scan(var_name, 2, '_');
          tooth_site=scan(var_name, 3, '_');
          if _var='BOP' then BOP=dta(i); 
          else if _var='CAL' then CAL=dta(i);
          else if _var='REC' then REC=dta(i);
          else if _var='PDD' then PDD=dta(i);
          else if _var='PI' then PI=dta(i);
          else if _var='GI' then GI=dta(i);
          output;
        end;
      end;
    end;
    keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

proc sort data=test;
  by Subjid visit_num tooth_num tooth_site;
run;

data want ;
  update test (obs=0 keep=Subjid visit_num tooth_num tooth_site) 
         test;
  by Subjid visit_num tooth_num tooth_site;
run;

HTH,

Art, CEO, AnalystFinder.com

Super Contributor
Posts: 312

Re: convert wide format data to long format with loop structure

Thank you very much, Art!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 161 views
  • 2 likes
  • 4 in conversation