DATA Step, Macro, Functions and more

convert wide format data to long format with loop structure

Accepted Solution Solved
Reply
Super Contributor
Posts: 318
Accepted Solution

convert wide format data to long format with loop structure

[ Edited ]

I have a data set in wide format, and want to convert to long format. Each subject has (2+32*36) variables as follow:

subjid, visit_num, 32 set of the following 36 variables

DTAMT1, DTAMO1
DTA1_BOP_DF DTA1_BOP_DL DTA1_BOP_F DTA1_BOP_L DTA1_BOP_MF DTA1_BOP_ML
DTA1_CAL_DF DTA1_CAL_DL DTA1_CAL_F DTA1_CAL_L DTA1_CAL_MF DTA1_CAL_ML
DTA1_PDD_DF DTA1_PDD_DL DTA1_PDD_F DTA1_PDD_L DTA1_PDD_MF DTA1_PDD_ML
DTA1_REC_DF DTA1_REC_DL DTA1_REC_F DTA1_REC_L DTA1_REC_MF DTA1_REC_ML
DTA1_PI_DF DTA1_PI_DL DTA1_PI_F DTA1_PI_L DTA1_PI_MF DTA1_PI_ML
DTA1_GI_DF DTA1_GI_F DTA1_GI_L DTA1_GI_MF

.....

DTAMT32, DTAMO32
DTA32_BOP_DF DTA32_BOP_DL DTA32_BOP_F DTA32_BOP_L DTA32_BOP_MF DTA32_BOP_ML
DTA32_CAL_DF DTA32_CAL_DL DTA32_CAL_F DTA32_CAL_L DTA32_CAL_MF DTA32_CAL_ML
DTA32_PDD_DF DTA32_PDD_DL DTA32_PDD_F DTA32_PDD_L DTA32_PDD_MF DTA32_PDD_ML
DTA32_REC_DF DTA32_REC_DL DTA32_REC_F DTA32_REC_L DTA32_REC_MF DTA32_REC_ML
DTA32_PI_DF DTA32_PI_DL DTA32_PI_F DTA32_PI_L DTA32_PI_MF DTA32_PI_ML
DTA32_GI_DF DTA32_GI_F DTA32_GI_L DTA32_GI_MF

 

The number after DTA is tooth_number (1,2, ..... 32), the letters in the end (DF, DL, F, L, MF and ML) are tooth_sites (total 6), and the letters in the middle are measure variables (total 6). Each subject has 32 teeth and each tooth has 6 sites, and eash site has 6 measures. 

 

I want to convert the wide format data to long format data, as shown in the following picture.

Capture.PNG

 

With the following code, I got something close, but with some issues:

 

data test;
	set mydata;
	array dta dta:;

	do over dta;
		var_name=vname(dta);
		if index(var_name, '_') > 0 then do;
			_tooth=scan(var_name, 1, '_');
			tooth_num=substr(_tooth, 4);
			_var=scan(var_name, 2, '_');
			tooth_site=scan(var_name, 3, '_');

			if _var='BOP' then BOP=dta;output; 
			if _var='CAL' then CAL=dta;output;
			if _var='REC' then REC=dta;output;
			if _var='PDD' then PDD=dta;output;
			if _var='PI' then PI=dta;output;
			if _var='GI' then GI=dta;output;
		end;
		if index(var_name, 'DTAMT') > 0 then DTAMT=dta;
		if index(var_name, 'DTAMO') > 0 then DTAMO=dta;
	end;
	keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

probably the problem is due to the output after each if statement. Can anyone give some suggestions on my code or other methods? Thank you in advance.


Accepted Solutions
Solution
‎02-23-2017 09:20 PM
PROC Star
Posts: 7,474

Re: convert wide format data to long format with loop structure

Posted in reply to fengyuwuzu

I, personally, don't have any problem with the approach you've taken. I think the following is all you need to complete the task:

 

data test;
	set mydata;
	array dta(*) dta:;

	do i=1 to dim(dta);
      var_name=vname(dta(i));
      if index(var_name, 'DTAMT') > 0 then DTAMT=dta(i);
      else if index(var_name, 'DTAMO') > 0 then DTAMO=dta(i);
      else do;
		if index(var_name, '_') > 0 then do;
          tooth_num=substr(scan(var_name, 1, '_'), 4);
          _var=scan(var_name, 2, '_');
          tooth_site=scan(var_name, 3, '_');
          if _var='BOP' then BOP=dta(i); 
          else if _var='CAL' then CAL=dta(i);
          else if _var='REC' then REC=dta(i);
          else if _var='PDD' then PDD=dta(i);
          else if _var='PI' then PI=dta(i);
          else if _var='GI' then GI=dta(i);
          output;
        end;
      end;
    end;
    keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

proc sort data=test;
  by Subjid visit_num tooth_num tooth_site;
run;

data want ;
  update test (obs=0 keep=Subjid visit_num tooth_num tooth_site) 
         test;
  by Subjid visit_num tooth_num tooth_site;
run;

HTH,

Art, CEO, AnalystFinder.com

View solution in original post


All Replies
PROC Star
Posts: 307

Re: convert wide format data to long format with loop structure

Posted in reply to fengyuwuzu

Why not just transpose all the "DTA" variables by ID, etc. and then extract the information you need from the _NAME_ variable?

Super User
Posts: 19,815

Re: convert wide format data to long format with loop structure

Posted in reply to fengyuwuzu

I may be missing something but wouldn't PROC TRANSPOSE do 99% of what you want here. You'll probably have to do a dataset to clean up some of the variables anyways, but it should get you really close to what you want.

 

 

PROC Star
Posts: 307

Re: convert wide format data to long format with loop structure

[ Edited ]

I like this comment because it agrees with mine!

 

And if the naming convention of the variables is as shown, you could get away with a simple "var dta:;" in the transpose.

Super Contributor
Posts: 318

Re: convert wide format data to long format with loop structure

Yes, I also tried proc transpose by id and visitnum, and everything is in the col1 column; from there I can extract different variables. But I also want to figure out why the code using array did not work. Thank you for your comments.
Solution
‎02-23-2017 09:20 PM
PROC Star
Posts: 7,474

Re: convert wide format data to long format with loop structure

Posted in reply to fengyuwuzu

I, personally, don't have any problem with the approach you've taken. I think the following is all you need to complete the task:

 

data test;
	set mydata;
	array dta(*) dta:;

	do i=1 to dim(dta);
      var_name=vname(dta(i));
      if index(var_name, 'DTAMT') > 0 then DTAMT=dta(i);
      else if index(var_name, 'DTAMO') > 0 then DTAMO=dta(i);
      else do;
		if index(var_name, '_') > 0 then do;
          tooth_num=substr(scan(var_name, 1, '_'), 4);
          _var=scan(var_name, 2, '_');
          tooth_site=scan(var_name, 3, '_');
          if _var='BOP' then BOP=dta(i); 
          else if _var='CAL' then CAL=dta(i);
          else if _var='REC' then REC=dta(i);
          else if _var='PDD' then PDD=dta(i);
          else if _var='PI' then PI=dta(i);
          else if _var='GI' then GI=dta(i);
          output;
        end;
      end;
    end;
    keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

proc sort data=test;
  by Subjid visit_num tooth_num tooth_site;
run;

data want ;
  update test (obs=0 keep=Subjid visit_num tooth_num tooth_site) 
         test;
  by Subjid visit_num tooth_num tooth_site;
run;

HTH,

Art, CEO, AnalystFinder.com

Super Contributor
Posts: 318

Re: convert wide format data to long format with loop structure

Thank you very much, Art!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 179 views
  • 2 likes
  • 4 in conversation