BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fengyuwuzu
Pyrite | Level 9

I have a data set in wide format, and want to convert to long format. Each subject has (2+32*36) variables as follow:

subjid, visit_num, 32 set of the following 36 variables

DTAMT1, DTAMO1
DTA1_BOP_DF DTA1_BOP_DL DTA1_BOP_F DTA1_BOP_L DTA1_BOP_MF DTA1_BOP_ML
DTA1_CAL_DF DTA1_CAL_DL DTA1_CAL_F DTA1_CAL_L DTA1_CAL_MF DTA1_CAL_ML
DTA1_PDD_DF DTA1_PDD_DL DTA1_PDD_F DTA1_PDD_L DTA1_PDD_MF DTA1_PDD_ML
DTA1_REC_DF DTA1_REC_DL DTA1_REC_F DTA1_REC_L DTA1_REC_MF DTA1_REC_ML
DTA1_PI_DF DTA1_PI_DL DTA1_PI_F DTA1_PI_L DTA1_PI_MF DTA1_PI_ML
DTA1_GI_DF DTA1_GI_F DTA1_GI_L DTA1_GI_MF

.....

DTAMT32, DTAMO32
DTA32_BOP_DF DTA32_BOP_DL DTA32_BOP_F DTA32_BOP_L DTA32_BOP_MF DTA32_BOP_ML
DTA32_CAL_DF DTA32_CAL_DL DTA32_CAL_F DTA32_CAL_L DTA32_CAL_MF DTA32_CAL_ML
DTA32_PDD_DF DTA32_PDD_DL DTA32_PDD_F DTA32_PDD_L DTA32_PDD_MF DTA32_PDD_ML
DTA32_REC_DF DTA32_REC_DL DTA32_REC_F DTA32_REC_L DTA32_REC_MF DTA32_REC_ML
DTA32_PI_DF DTA32_PI_DL DTA32_PI_F DTA32_PI_L DTA32_PI_MF DTA32_PI_ML
DTA32_GI_DF DTA32_GI_F DTA32_GI_L DTA32_GI_MF

 

The number after DTA is tooth_number (1,2, ..... 32), the letters in the end (DF, DL, F, L, MF and ML) are tooth_sites (total 6), and the letters in the middle are measure variables (total 6). Each subject has 32 teeth and each tooth has 6 sites, and eash site has 6 measures. 

 

I want to convert the wide format data to long format data, as shown in the following picture.

Capture.PNG

 

With the following code, I got something close, but with some issues:

 

data test;
	set mydata;
	array dta dta:;

	do over dta;
		var_name=vname(dta);
		if index(var_name, '_') > 0 then do;
			_tooth=scan(var_name, 1, '_');
			tooth_num=substr(_tooth, 4);
			_var=scan(var_name, 2, '_');
			tooth_site=scan(var_name, 3, '_');

			if _var='BOP' then BOP=dta;output; 
			if _var='CAL' then CAL=dta;output;
			if _var='REC' then REC=dta;output;
			if _var='PDD' then PDD=dta;output;
			if _var='PI' then PI=dta;output;
			if _var='GI' then GI=dta;output;
		end;
		if index(var_name, 'DTAMT') > 0 then DTAMT=dta;
		if index(var_name, 'DTAMO') > 0 then DTAMO=dta;
	end;
	keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

probably the problem is due to the output after each if statement. Can anyone give some suggestions on my code or other methods? Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I, personally, don't have any problem with the approach you've taken. I think the following is all you need to complete the task:

 

data test;
	set mydata;
	array dta(*) dta:;

	do i=1 to dim(dta);
      var_name=vname(dta(i));
      if index(var_name, 'DTAMT') > 0 then DTAMT=dta(i);
      else if index(var_name, 'DTAMO') > 0 then DTAMO=dta(i);
      else do;
		if index(var_name, '_') > 0 then do;
          tooth_num=substr(scan(var_name, 1, '_'), 4);
          _var=scan(var_name, 2, '_');
          tooth_site=scan(var_name, 3, '_');
          if _var='BOP' then BOP=dta(i); 
          else if _var='CAL' then CAL=dta(i);
          else if _var='REC' then REC=dta(i);
          else if _var='PDD' then PDD=dta(i);
          else if _var='PI' then PI=dta(i);
          else if _var='GI' then GI=dta(i);
          output;
        end;
      end;
    end;
    keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

proc sort data=test;
  by Subjid visit_num tooth_num tooth_site;
run;

data want ;
  update test (obs=0 keep=Subjid visit_num tooth_num tooth_site) 
         test;
  by Subjid visit_num tooth_num tooth_site;
run;

HTH,

Art, CEO, AnalystFinder.com

View solution in original post

6 REPLIES 6
collinelliot
Barite | Level 11

Why not just transpose all the "DTA" variables by ID, etc. and then extract the information you need from the _NAME_ variable?

Reeza
Super User

I may be missing something but wouldn't PROC TRANSPOSE do 99% of what you want here. You'll probably have to do a dataset to clean up some of the variables anyways, but it should get you really close to what you want.

 

 

collinelliot
Barite | Level 11

I like this comment because it agrees with mine!

 

And if the naming convention of the variables is as shown, you could get away with a simple "var dta:;" in the transpose.

fengyuwuzu
Pyrite | Level 9
Yes, I also tried proc transpose by id and visitnum, and everything is in the col1 column; from there I can extract different variables. But I also want to figure out why the code using array did not work. Thank you for your comments.
art297
Opal | Level 21

I, personally, don't have any problem with the approach you've taken. I think the following is all you need to complete the task:

 

data test;
	set mydata;
	array dta(*) dta:;

	do i=1 to dim(dta);
      var_name=vname(dta(i));
      if index(var_name, 'DTAMT') > 0 then DTAMT=dta(i);
      else if index(var_name, 'DTAMO') > 0 then DTAMO=dta(i);
      else do;
		if index(var_name, '_') > 0 then do;
          tooth_num=substr(scan(var_name, 1, '_'), 4);
          _var=scan(var_name, 2, '_');
          tooth_site=scan(var_name, 3, '_');
          if _var='BOP' then BOP=dta(i); 
          else if _var='CAL' then CAL=dta(i);
          else if _var='REC' then REC=dta(i);
          else if _var='PDD' then PDD=dta(i);
          else if _var='PI' then PI=dta(i);
          else if _var='GI' then GI=dta(i);
          output;
        end;
      end;
    end;
    keep Subjid visit_num tooth_num tooth_site  DTAMT DTAMO BOP CAL PDD REC PI GI;
run;

proc sort data=test;
  by Subjid visit_num tooth_num tooth_site;
run;

data want ;
  update test (obs=0 keep=Subjid visit_num tooth_num tooth_site) 
         test;
  by Subjid visit_num tooth_num tooth_site;
run;

HTH,

Art, CEO, AnalystFinder.com

fengyuwuzu
Pyrite | Level 9

Thank you very much, Art!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1266 views
  • 2 likes
  • 4 in conversation