I have a data set in wide format, and want to convert to long format. Each subject has (2+32*36) variables as follow:
subjid, visit_num, 32 set of the following 36 variables
DTAMT1, DTAMO1
DTA1_BOP_DF DTA1_BOP_DL DTA1_BOP_F DTA1_BOP_L DTA1_BOP_MF DTA1_BOP_ML
DTA1_CAL_DF DTA1_CAL_DL DTA1_CAL_F DTA1_CAL_L DTA1_CAL_MF DTA1_CAL_ML
DTA1_PDD_DF DTA1_PDD_DL DTA1_PDD_F DTA1_PDD_L DTA1_PDD_MF DTA1_PDD_ML
DTA1_REC_DF DTA1_REC_DL DTA1_REC_F DTA1_REC_L DTA1_REC_MF DTA1_REC_ML
DTA1_PI_DF DTA1_PI_DL DTA1_PI_F DTA1_PI_L DTA1_PI_MF DTA1_PI_ML
DTA1_GI_DF DTA1_GI_F DTA1_GI_L DTA1_GI_MF
.....
DTAMT32, DTAMO32
DTA32_BOP_DF DTA32_BOP_DL DTA32_BOP_F DTA32_BOP_L DTA32_BOP_MF DTA32_BOP_ML
DTA32_CAL_DF DTA32_CAL_DL DTA32_CAL_F DTA32_CAL_L DTA32_CAL_MF DTA32_CAL_ML
DTA32_PDD_DF DTA32_PDD_DL DTA32_PDD_F DTA32_PDD_L DTA32_PDD_MF DTA32_PDD_ML
DTA32_REC_DF DTA32_REC_DL DTA32_REC_F DTA32_REC_L DTA32_REC_MF DTA32_REC_ML
DTA32_PI_DF DTA32_PI_DL DTA32_PI_F DTA32_PI_L DTA32_PI_MF DTA32_PI_ML
DTA32_GI_DF DTA32_GI_F DTA32_GI_L DTA32_GI_MF
The number after DTA is tooth_number (1,2, ..... 32), the letters in the end (DF, DL, F, L, MF and ML) are tooth_sites (total 6), and the letters in the middle are measure variables (total 6). Each subject has 32 teeth and each tooth has 6 sites, and eash site has 6 measures.
I want to convert the wide format data to long format data, as shown in the following picture.
With the following code, I got something close, but with some issues:
data test;
set mydata;
array dta dta:;
do over dta;
var_name=vname(dta);
if index(var_name, '_') > 0 then do;
_tooth=scan(var_name, 1, '_');
tooth_num=substr(_tooth, 4);
_var=scan(var_name, 2, '_');
tooth_site=scan(var_name, 3, '_');
if _var='BOP' then BOP=dta;output;
if _var='CAL' then CAL=dta;output;
if _var='REC' then REC=dta;output;
if _var='PDD' then PDD=dta;output;
if _var='PI' then PI=dta;output;
if _var='GI' then GI=dta;output;
end;
if index(var_name, 'DTAMT') > 0 then DTAMT=dta;
if index(var_name, 'DTAMO') > 0 then DTAMO=dta;
end;
keep Subjid visit_num tooth_num tooth_site DTAMT DTAMO BOP CAL PDD REC PI GI;
run;
probably the problem is due to the output after each if statement. Can anyone give some suggestions on my code or other methods? Thank you in advance.
I, personally, don't have any problem with the approach you've taken. I think the following is all you need to complete the task:
data test; set mydata; array dta(*) dta:; do i=1 to dim(dta); var_name=vname(dta(i)); if index(var_name, 'DTAMT') > 0 then DTAMT=dta(i); else if index(var_name, 'DTAMO') > 0 then DTAMO=dta(i); else do; if index(var_name, '_') > 0 then do; tooth_num=substr(scan(var_name, 1, '_'), 4); _var=scan(var_name, 2, '_'); tooth_site=scan(var_name, 3, '_'); if _var='BOP' then BOP=dta(i); else if _var='CAL' then CAL=dta(i); else if _var='REC' then REC=dta(i); else if _var='PDD' then PDD=dta(i); else if _var='PI' then PI=dta(i); else if _var='GI' then GI=dta(i); output; end; end; end; keep Subjid visit_num tooth_num tooth_site DTAMT DTAMO BOP CAL PDD REC PI GI; run; proc sort data=test; by Subjid visit_num tooth_num tooth_site; run; data want ; update test (obs=0 keep=Subjid visit_num tooth_num tooth_site) test; by Subjid visit_num tooth_num tooth_site; run;
HTH,
Art, CEO, AnalystFinder.com
Why not just transpose all the "DTA" variables by ID, etc. and then extract the information you need from the _NAME_ variable?
I may be missing something but wouldn't PROC TRANSPOSE do 99% of what you want here. You'll probably have to do a dataset to clean up some of the variables anyways, but it should get you really close to what you want.
I like this comment because it agrees with mine!
And if the naming convention of the variables is as shown, you could get away with a simple "var dta:;" in the transpose.
I, personally, don't have any problem with the approach you've taken. I think the following is all you need to complete the task:
data test; set mydata; array dta(*) dta:; do i=1 to dim(dta); var_name=vname(dta(i)); if index(var_name, 'DTAMT') > 0 then DTAMT=dta(i); else if index(var_name, 'DTAMO') > 0 then DTAMO=dta(i); else do; if index(var_name, '_') > 0 then do; tooth_num=substr(scan(var_name, 1, '_'), 4); _var=scan(var_name, 2, '_'); tooth_site=scan(var_name, 3, '_'); if _var='BOP' then BOP=dta(i); else if _var='CAL' then CAL=dta(i); else if _var='REC' then REC=dta(i); else if _var='PDD' then PDD=dta(i); else if _var='PI' then PI=dta(i); else if _var='GI' then GI=dta(i); output; end; end; end; keep Subjid visit_num tooth_num tooth_site DTAMT DTAMO BOP CAL PDD REC PI GI; run; proc sort data=test; by Subjid visit_num tooth_num tooth_site; run; data want ; update test (obs=0 keep=Subjid visit_num tooth_num tooth_site) test; by Subjid visit_num tooth_num tooth_site; run;
HTH,
Art, CEO, AnalystFinder.com
Thank you very much, Art!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.