BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
CorinneT
Obsidian | Level 7

Hi, 

Please could you assist me regarding the fictious program below (there is only one variable to transpose). It's not work. I would to have one row by id (patient) without use proc TRANSPOSE. Because, I would like transpose more than 3 variables by patient.

 

data A;
length id 8. ;
input id Date_exam;
format Date_exam ddmmyy10.;
informat Date_exam ddmmyy10.;
datalines;
0001 01/10/2020
0002 11/10/2001
0003 16/09/2010
0004 30/01/1960
0001 02/10/2020
0002 15/10/2001
0002 15/10/2002
0003 16/09/2010
0004 30/01/1960
0003 16/09/2004
0003 16/09/2018
0005 06/09/2017
;
run ;

proc sort data=A;by id Date_exam; run ;

%let i=1;
data B; set A;
by id;
retain id Date&i;
array date[1:4];
do i=1 to 4;
Date&i=Date_exam;
output;
end;
run ;

 

I don't have one row by patient for the variable Date_exam. Please, not use Proc TRANSPOSE. I know this row, it's ok for me but not efficience when you have more yhan 3 variables.

Thank you so much in advance.

Corinne 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
In PROC TRANSPOSE you do it once for each variable and then merge the results. But it's fully dynamic and you don't need to worry if the number of dates changes. If you have 6 maximum dates next time you need to update your code or design it so that the data step does that ahead of time, which adds another layer of complexity to this code.

In the data step, you have to handle all of the retaining manually so it is more complex.

The steps are:
1. Sort to ensure correct order
2. Declare an array with a maximum dimensions required so you can use index references - need this information before the step
3. At the start of each ID, initialize all your dates to missing and set your counter to 0
4. Increment counter at each date
5. For each date, assign it to the correct array variable - use RETAIN to hold values across rows.
6. If last record then output to data set
7. Format dates so they appear as SAS dates
8. Repeat steps 2 & 5 for each variable that requires transposing

Transpose solution:
1. Sort
2. Transpose Var1
3. Transpose Var2
4. Transpose Var3
5. Merge results
6. Format data as required

Which one is less error prone and easier to code?

View solution in original post

9 REPLIES 9
Reeza
Super User

You're right it's tedious but not as much work as you think since you need to declare two more arrays and assignments/clearing.

You don't need the macro variables though, that's what the array indices are for. 

 

 

Here's some transposing data tutorials, not sure why you don't want to use PROC TRANSPOSE since it's more efficient in the long run. Another thing to consider - look back at how this data was created and see if there's a way to change it.


Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd

 

proc sort data=A;
	by id Date_exam;
run;

data B;
	set A;
	by id;
	retain id Date1-Date4;
	format date1-date4 date9.;
	array date[1:4];

	if first.id then
		do;
			count=0;
			call missing(of date(*));
		end;
	count+1;
	Date[count]=Date_exam;

	if last.id then
		output;
run;

@CorinneT wrote:

Hi, 

Please could you assist me regarding the fictious program below (there is only one variable to transpose). It's not work. I would to have one row by id (patient) without use proc TRANSPOSE. Because, I would like transpose more than 3 variables by patient.

 

data A;
length id 8. ;
input id Date_exam;
format Date_exam ddmmyy10.;
informat Date_exam ddmmyy10.;
datalines;
0001 01/10/2020
0002 11/10/2001
0003 16/09/2010
0004 30/01/1960
0001 02/10/2020
0002 15/10/2001
0002 15/10/2002
0003 16/09/2010
0004 30/01/1960
0003 16/09/2004
0003 16/09/2018
0005 06/09/2017
;
run ;

proc sort data=A;by id Date_exam; run ;

%let i=1;
data B; set A;
by id;
retain id Date&i;
array date[1:4];
do i=1 to 4;
Date&i=Date_exam;
output;
end;
run ;

 

I don't have one row by patient for the variable Date_exam. Please, not use Proc TRANSPOSE. I know this row, it's ok for me but not efficience when you have more yhan 3 variables.

Thank you so much in advance.

Corinne 


 

 

Reeza
Super User
Note that in this method you need to know ahead of time, what the max number of dates/variable2/variable3 that you need to account for. With the PROC TRANSPOSE approach you do not need to know this ahead of time.

CorinneT
Obsidian | Level 7
Thanks, this program runs. But it's quit complicated. I will follow your tip regarding the Proc TRANSPOSE. But my concern with Proc Transpose when you have to transpose more than 3 variables in one row for each observation (patient Id) it isn't easy with Proc TRANSPOSE.
Reeza
Super User
In PROC TRANSPOSE you do it once for each variable and then merge the results. But it's fully dynamic and you don't need to worry if the number of dates changes. If you have 6 maximum dates next time you need to update your code or design it so that the data step does that ahead of time, which adds another layer of complexity to this code.

In the data step, you have to handle all of the retaining manually so it is more complex.

The steps are:
1. Sort to ensure correct order
2. Declare an array with a maximum dimensions required so you can use index references - need this information before the step
3. At the start of each ID, initialize all your dates to missing and set your counter to 0
4. Increment counter at each date
5. For each date, assign it to the correct array variable - use RETAIN to hold values across rows.
6. If last record then output to data set
7. Format dates so they appear as SAS dates
8. Repeat steps 2 & 5 for each variable that requires transposing

Transpose solution:
1. Sort
2. Transpose Var1
3. Transpose Var2
4. Transpose Var3
5. Merge results
6. Format data as required

Which one is less error prone and easier to code?
CorinneT
Obsidian | Level 7
Thanks so much regarding the steps of ARRAY solution. It's more clear.
Yes, the Proc TRANSPOSE prone less error. What's I'm not like with Proc TRANSPOSE is that you have to transpose the variables one after the other. Dans after mrege each tables.
however, you are right Proc TRANSPOSE seems to be method least error prone to code.
Tom
Super User Tom
Super User

Just use PROC SUMMARY. 

data have ;
  input id Date_exam;
  informat Date_exam ddmmyy.;
  format Date_exam ddmmyy10.;
  days_ago = today()-Date_exam;
datalines;
0001 01/10/2020
0002 11/10/2001
0003 16/09/2010
0004 30/01/1960
0001 02/10/2020
0002 15/10/2001
0002 15/10/2002
0003 16/09/2010
0004 30/01/1960
0003 16/09/2004
0003 16/09/2018
0005 06/09/2017
;

proc summary data=have nway ;
  class id ;
  output out=want idgroup(out[5] (Date_exam days_ago)= ) ;
run;

Results:

                             Date_exam_   Date_exam_   Date_exam_   Date_exam_   Date_exam_   days_   days_   days_   days_   days_
Obs   id   _TYPE_   _FREQ_       1            2            3            4            5        ago_1   ago_2   ago_3   ago_4   ago_5

 1     1      1        2     01/10/2020   02/10/2020            .            .            .     393     392       .       .     .
 2     2      1        3     11/10/2001   15/10/2001   15/10/2002            .            .    7323    7319    6954       .     .
 3     3      1        4     16/09/2010   16/09/2010   16/09/2004   16/09/2018            .    4061    4061    6252    1139     .
 4     4      1        2     30/01/1960   30/01/1960            .            .            .   22553   22553       .       .     .
 5     5      1        1     06/09/2017            .            .            .            .    1514       .       .       .     .

You still need to set an upper bound on the number of copies you want out. So either pre-count or just live with your guess for the right number.

 

 

Reeza
Super User
Is PROC SUMMARY limited to only character variables though?
Tom
Super User Tom
Super User

@Reeza wrote:
Is PROC SUMMARY limited to only character variables though?

No.

proc summary data=sashelp.class nway ;
  class age ;
  output out=want idgroup(out[5] (Name Sex Height Weight)= ) ;
run;
Reeza
Super User
Nice! I always forget about that method for some reason 🙂

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Autotuning Deep Learning Models Using SAS

Follow along as SAS’ Robert Blanchard explains three aspects of autotuning in a deep learning context: globalized search, localized search and an in parallel method using SAS.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2723 views
  • 4 likes
  • 3 in conversation