BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fengyuwuzu
Pyrite | Level 9

I have a data set in wide format, and want to convert to long format. Each subject has (2+32*36) variables as follow:

subjid, visit_num, 32 set of the following 36 variables

DTAMT1, DTAMO1
DTA1_BOP_DF DTA1_BOP_DL DTA1_BOP_F DTA1_BOP_L DTA1_BOP_MF DTA1_BOP_ML
DTA1_CAL_DF DTA1_CAL_DL DTA1_CAL_F DTA1_CAL_L DTA1_CAL_MF DTA1_CAL_ML
DTA1_PDD_DF DTA1_PDD_DL DTA1_PDD_F DTA1_PDD_L DTA1_PDD_MF DTA1_PDD_ML
DTA1_REC_DF DTA1_REC_DL DTA1_REC_F DTA1_REC_L DTA1_REC_MF DTA1_REC_ML
DTA1_PI_DF DTA1_PI_DL DTA1_PI_F DTA1_PI_L DTA1_PI_MF DTA1_PI_ML
DTA1_GI_DF DTA1_GI_F DTA1_GI_L DTA1_GI_MF

.....

DTAMT32, DTAMO32
DTA32_BOP_DF DTA32_BOP_DL DTA32_BOP_F DTA32_BOP_L DTA32_BOP_MF DTA32_BOP_ML
DTA32_CAL_DF DTA32_CAL_DL DTA32_CAL_F DTA32_CAL_L DTA32_CAL_MF DTA32_CAL_ML
DTA32_PDD_DF DTA32_PDD_DL DTA32_PDD_F DTA32_PDD_L DTA32_PDD_MF DTA32_PDD_ML
DTA32_REC_DF DTA32_REC_DL DTA32_REC_F DTA32_REC_L DTA32_REC_MF DTA32_REC_ML
DTA32_PI_DF DTA32_PI_DL DTA32_PI_F DTA32_PI_L DTA32_PI_MF DTA32_PI_ML
DTA32_GI_DF DTA32_GI_F DTA32_GI_L DTA32_GI_MF

 

The number after DTA is tooth_number (1,2, ..... 32), the letters in the end (DF, DL, F, L, MF and ML) are tooth_sites (total 6), and the letters in the middle are measure variables (total 6). Each subject has 32 teeth and each tooth has 6 sites, and eash site has 6 measures. 

 

I want to convert the wide format data to long format data, as shown in the following picture.

Capture.PNG

 

With the following code, I got something close, but with some issues:

 

data test;
	set mydata;
	array dta dta:;

	do over dta;
		var_name=vname(dta);
		if index(var_name, '_') > 0 then do;
			_tooth=scan(var_name, 1, '_');
			tooth_num=substr(_tooth, 4);
			_var=scan(var_name, 2, '_');
			tooth_site=scan(var_name, 3, '_');

			if _var='BOP' then BOP=dta;output; 
			if _var='CAL' then CAL=dta;output;
			if _var='REC' then REC=dta;output;
			if _var='PDD' then PDD=dta;output;
			if _var='PI' then PI=dta;output;
			if _var='GI' then GI=dta;output;
		end;
		if index(var_name, 'DTAMT') > 0 then DTAMT=dta;
		if index(var_name, 'DTAMO') > 0 then DTAMO=dta;
	end;
	keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

probably the problem is due to the output after each if statement. Can anyone give some suggestions on my code or other methods? Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I, personally, don't have any problem with the approach you've taken. I think the following is all you need to complete the task:

 

data test;
	set mydata;
	array dta(*) dta:;

	do i=1 to dim(dta);
      var_name=vname(dta(i));
      if index(var_name, 'DTAMT') > 0 then DTAMT=dta(i);
      else if index(var_name, 'DTAMO') > 0 then DTAMO=dta(i);
      else do;
		if index(var_name, '_') > 0 then do;
          tooth_num=substr(scan(var_name, 1, '_'), 4);
          _var=scan(var_name, 2, '_');
          tooth_site=scan(var_name, 3, '_');
          if _var='BOP' then BOP=dta(i); 
          else if _var='CAL' then CAL=dta(i);
          else if _var='REC' then REC=dta(i);
          else if _var='PDD' then PDD=dta(i);
          else if _var='PI' then PI=dta(i);
          else if _var='GI' then GI=dta(i);
          output;
        end;
      end;
    end;
    keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

proc sort data=test;
  by Subjid visit_num tooth_num tooth_site;
run;

data want ;
  update test (obs=0 keep=Subjid visit_num tooth_num tooth_site) 
         test;
  by Subjid visit_num tooth_num tooth_site;
run;

HTH,

Art, CEO, AnalystFinder.com

View solution in original post

6 REPLIES 6
collinelliot
Barite | Level 11

Why not just transpose all the "DTA" variables by ID, etc. and then extract the information you need from the _NAME_ variable?

Reeza
Super User

I may be missing something but wouldn't PROC TRANSPOSE do 99% of what you want here. You'll probably have to do a dataset to clean up some of the variables anyways, but it should get you really close to what you want.

 

 

collinelliot
Barite | Level 11

I like this comment because it agrees with mine!

 

And if the naming convention of the variables is as shown, you could get away with a simple "var dta:;" in the transpose.

fengyuwuzu
Pyrite | Level 9
Yes, I also tried proc transpose by id and visitnum, and everything is in the col1 column; from there I can extract different variables. But I also want to figure out why the code using array did not work. Thank you for your comments.
art297
Opal | Level 21

I, personally, don't have any problem with the approach you've taken. I think the following is all you need to complete the task:

 

data test;
	set mydata;
	array dta(*) dta:;

	do i=1 to dim(dta);
      var_name=vname(dta(i));
      if index(var_name, 'DTAMT') > 0 then DTAMT=dta(i);
      else if index(var_name, 'DTAMO') > 0 then DTAMO=dta(i);
      else do;
		if index(var_name, '_') > 0 then do;
          tooth_num=substr(scan(var_name, 1, '_'), 4);
          _var=scan(var_name, 2, '_');
          tooth_site=scan(var_name, 3, '_');
          if _var='BOP' then BOP=dta(i); 
          else if _var='CAL' then CAL=dta(i);
          else if _var='REC' then REC=dta(i);
          else if _var='PDD' then PDD=dta(i);
          else if _var='PI' then PI=dta(i);
          else if _var='GI' then GI=dta(i);
          output;
        end;
      end;
    end;
    keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

proc sort data=test;
  by Subjid visit_num tooth_num tooth_site;
run;

data want ;
  update test (obs=0 keep=Subjid visit_num tooth_num tooth_site) 
         test;
  by Subjid visit_num tooth_num tooth_site;
run;

HTH,

Art, CEO, AnalystFinder.com

fengyuwuzu
Pyrite | Level 9

Thank you very much, Art!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1674 views
  • 2 likes
  • 4 in conversation