PROC TRANSPOSE with dates as column heades

Jumboshrimps · Posted 07-08-2021 05:59 PM

I've seen a couple responses to proc transpose with dates on this site, but none of the solutions worked as I have more than one date per unique ID (pat_id) My first proc transpose from the code below worked perfectly, from the link

https://sasexamplecode.com/complete-guide-to-proc-transpose-in-sas/

producing the screen shot below:

However, each category (ID), appears only ONE time per unique ID (pat_id) - the BY variable.

Proc transpose performs perfectly, after pre-sorting the data set and nodupkey.

PROC TRANSPOSE DATA = HTN_MEDS_START_NU_2 out = htn_meDS_START_NU3
LABEL=VAR_LABEL ;
BY pat_id NOTSORTED;
VAR MED ;
ID CATEGORY;

PAT_ID	THIAZIDE	LOOP	CCB	ACE	ARB	ALPHA_BLKR	BETA_BLKR
211	1
423		1	1
423	1
634			1	1
846	1		1
846					1
1057	1				1	1
1057						1
1269						1	1
1480					1
1480						1	1
1480	1
1691	1				1		1
1903					1
1903				1	1
1903	1
2114	1					1
2326	1
2326	1
2537						1
2748	1				1
2748	1
2748	1
2960							1
3171	1
3171	1
3383					1
3594						1

Each column header is a category, if the pat_id does not have that category (there are only ten categories) then the result is null, else it's a 1. Each pat_id has one row with a complete set of data, no duplicate pat_id's.

Now I want to do the same thing with the dates associated with those categories - each observation has a category and one date - only three vars - pat_id, category, start_date. "MED" is set to 1 for every record, so Proc Transpose has a var to calculate. Cheesy, I know.

HOWEVER a unique pat_id might have/will have multiple unique categories, BUT two of those categories might have the same start_date for the same pat_id.

Example below.

ROW	PAT_ID	CATEGORY	START_DATE	MED
1	211	ACE	9/11/20	1
2	423	ARB	9/20/18	1
3	423	POT_SPAR_DIURETIC	9/20/18	1
4	634	ACE	1/13/21	1
5	846	LOOP_DIURETIC	12/30/20	1
6	846	POT_SPAR_DIURETIC	12/30/20	1
7	1057	ACE	5/8/20	1
8	1057	POT_SPAR_DIURETIC	8/22/19	1
9	1269	BETA_BLOCKER	2/16/21	1
10	1480	ACE	3/1/21	1
11	1480	BETA_BLOCKER	2/5/21	1
12	1480	THIAZIDE_DIURETIC	3/1/21	1
13	1691	THIAZIDE_DIURETIC	9/7/20	1
14	1903	CCB	2/16/21	1
15	1903	THIAZIDE_DIURETIC	1/19/21	1
16	1903	BETA_BLOCKER	1/7/21	1

We see that rows 10-12 have the same pat_id, with three different, unique categories. So far so good. But two of the categories have the same START_DATE, rows 10 and 12.

Code below throws the following errors, and ends with the final message:

"Too many bad BY groups".

PROC TRANSPOSE DATA = HTN_MEDS_START_NU_2 out = htn_meDS_START_NU3_DT
LABEL=VAR_LABEL ;
BY pat_id ;
VAR start_date ;

RUN;

NOTE: The above message was for the following BY group:
PAT_ID=~~xxxxxxxxx~~
ERROR: The ID value "'02JUL2020'n" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
PAT_ID=~~xxxxxxxxx~~
ERROR: The ID value "'13OCT2020'n" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
PAT_ID=~~xxxxxxxxx~~
ERROR: The ID value "'20NOV2020'n" occurs twice in the same BY group.
ERROR: The ID value "'20NOV2020'n" occurs twice in the same BY group.
ERROR: Too many bad BY groups.
NOTE: The SAS System stopped

This is what I expected:

PAT_ID	THIAZIDE_DATE	LOOP_DATE	CCB_DATE	ACE_DATE
211				09/11/2020
423
423				01/13/2021
634		12/30/2020
846				05/08/2020
846
1057	02/17/2021			03/01/2021
1057	09/07/2020
1269	01/19/2021		03/07/2021
1480
1480	09/25/2020		11/02/2020
1480				09/30/2020
1691			12/28/2020	12/28/2020
1903
1903		10/12/2020
1903				09/10/2020
2114	12/08/2020			02/02/2021
2326				08/20/2019
2326				07/09/2019
2537	01/09/2019
2748				02/22/2021
2748				10/01/2020
2748				06/30/2019
2960			03/20/2019
3171				02/20/2021
3171				06/01/2020
3383				02/22/2021

The category name is the prefix for the column, and the string "_DATE". If the pat_id has an occurrence of "ALPHA_BLKR"

in the original data set, there would be a 1 under ALPHA_BLKR for that pat_id record, and the date column associated with that pat_id would be "ALPHA_BLKR_DATE", and the start_date.

So the question is - at least as I understand - how to place the category name as the prefix in the dates column header, along with "_DATE". AND how to run a PROC transpose on data that has multiple, but identical values (start_date) spread across one unique ID.

I've done prefixes with PROC TRANPOSE before, changing "COL1, COL2, COL3, etc to "Diagnosis1, Diagnosis2, Diagnosis3", but that's a fixed string just incrementing plus 1 for each column - not pulling the contents (Category) of the record as part of the date column header.

THANX.

PaigeMiller · Posted 07-08-2021 06:10 PM

Ignoring the difficulty of doing this transformation for a second, it still seems to me that whatever you want to do as the next step with this data would be much easier programmed on un-transformed data. It is generally considered to be a bad practice to put data (in this case, a calendar date) into a variable name, and it makes the subsequent programming much more difficult.

So, where is this headed? What are you going to do with this data after you transform it (or even if you leave it untransformed)?

--
Paige Miller

ballardw · Posted 07-08-2021 07:01 PM

@Jumboshrimps wrote:

I've seen a couple responses to proc transpose with dates on this site, but none of the solutions worked as I have more than one date per unique ID (pat_id) My first proc transpose from the code below worked perfectly, from the link

https://sasexamplecode.com/complete-guide-to-proc-transpose-in-sas/

producing the screen shot below:

However, each category (ID), appears only ONE time per unique ID (pat_id) - the BY variable.

Proc transpose performs perfectly, after pre-sorting the data set and nodupkey.

PROC TRANSPOSE DATA = HTN_MEDS_START_NU_2 out = htn_meDS_START_NU3
LABEL=VAR_LABEL ;
BY pat_id NOTSORTED;
VAR MED ;
ID CATEGORY;

PAT_ID THIAZIDE LOOP CCB ACE ARB ALPHA_BLKR BETA_BLKR

211 1

423 1 1

423 1

634 1 1

846 1 1

846 1

1057 1 1 1

1057 1

1269 1 1

1480 1

1480 1 1

1480 1

1691 1 1 1

1903 1

1903 1 1

1903 1

2114 1 1

2326 1

2326 1

2537 1

2748 1 1

2748 1

2748 1

2960 1

3171 1

3171 1

3383 1

3594 1

Each column header is a category, if the pat_id does not have that category (there are only ten categories) then the result is null, else it's a 1. Each pat_id has one row with a complete set of data, no duplicate pat_id's.

Now I want to do the same thing with the dates associated with those categories - each observation has a category and one date - only three vars - pat_id, category, start_date. "MED" is set to 1 for every record, so Proc Transpose has a var to calculate. Cheesy, I know.

HOWEVER a unique pat_id might have/will have multiple unique categories, BUT two of those categories might have the same start_date for the same pat_id.

Example below.

ROW PAT_ID CATEGORY START_DATE MED

1 211 ACE 9/11/20 1

2 423 ARB 9/20/18 1

3 423 POT_SPAR_DIURETIC 9/20/18 1

4 634 ACE 1/13/21 1

5 846 LOOP_DIURETIC 12/30/20 1

6 846 POT_SPAR_DIURETIC 12/30/20 1

7 1057 ACE 5/8/20 1

8 1057 POT_SPAR_DIURETIC 8/22/19 1

9 1269 BETA_BLOCKER 2/16/21 1

10 1480 ACE 3/1/21 1

11 1480 BETA_BLOCKER 2/5/21 1

12 1480 THIAZIDE_DIURETIC 3/1/21 1

13 1691 THIAZIDE_DIURETIC 9/7/20 1

14 1903 CCB 2/16/21 1

15 1903 THIAZIDE_DIURETIC 1/19/21 1

16 1903 BETA_BLOCKER 1/7/21 1

We see that rows 10-12 have the same pat_id, with three different, unique categories. So far so good. But two of the categories have the same START_DATE, rows 10 and 12.

Code below throws the following errors, and ends with the final message:

"Too many bad BY groups".

PROC TRANSPOSE DATA = HTN_MEDS_START_NU_2 out = htn_meDS_START_NU3_DT
LABEL=VAR_LABEL ;
BY pat_id ;
VAR start_date ;

RUN;

NOTE: The above message was for the following BY group:
PAT_ID=~~xxxxxxxxx~~
ERROR: The ID value "'02JUL2020'n" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
PAT_ID=~~xxxxxxxxx~~
ERROR: The ID value "'13OCT2020'n" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
PAT_ID=~~xxxxxxxxx~~
ERROR: The ID value "'20NOV2020'n" occurs twice in the same BY group.
ERROR: The ID value "'20NOV2020'n" occurs twice in the same BY group.
ERROR: Too many bad BY groups.
NOTE: The SAS System stopped

This is what I expected:

PAT_ID THIAZIDE_DATE LOOP_DATE CCB_DATE ACE_DATE

211 09/11/2020

423

423 01/13/2021

634 12/30/2020

846 05/08/2020

846

1057 02/17/2021 03/01/2021

1057 09/07/2020

1269 01/19/2021 03/07/2021

1480

1480 09/25/2020 11/02/2020

1480 09/30/2020

1691 12/28/2020 12/28/2020

1903

1903 10/12/2020

1903 09/10/2020

2114 12/08/2020 02/02/2021

2326 08/20/2019

2326 07/09/2019

2537 01/09/2019

2748 02/22/2021

2748 10/01/2020

2748 06/30/2019

2960 03/20/2019

3171 02/20/2021

3171 06/01/2020

3383 02/22/2021

The category name is the prefix for the column, and the string "_DATE". If the pat_id has an occurrence of "ALPHA_BLKR"

in the original data set, there would be a 1 under ALPHA_BLKR for that pat_id record, and the date column associated with that pat_id would be "ALPHA_BLKR_DATE", and the start_date.

So the question is - at least as I understand - how to place the category name as the prefix in the dates column header, along with "_DATE". AND how to run a PROC transpose on data that has multiple, but identical values (start_date) spread across one unique ID.

I've done prefixes with PROC TRANPOSE before, changing "COL1, COL2, COL3, etc to "Diagnosis1, Diagnosis2, Diagnosis3", but that's a fixed string just incrementing plus 1 for each column - not pulling the contents (Category) of the record as part of the date column header.

THANX.

Please make sure that your "expected" bears at least some resemblance to the input data.

Your "example" data for pat_id 1480 shows dates of 3/1/21, 3/1/21 and 2/5/21. Your "expected" data for pat_id 1480 shows dates of 09/25/2020, 11/02/2020 and 09/30/2020. Please describe exactly how you get an "expected" result of those dates from the "example" data.

Also I strongly suspect the code shown does not create the warning shown. The warning relates to an ID variable and would typically appear when there is an ID statement with a variable that has repeats. Your code shows no ID statement.

I have to say this output probably belongs in a report not transposed data.

Jumboshrimps · Posted 07-08-2021 09:46 PM

The two examples are MASKED confidential patient data. The do NOT align. For illustration purposes only. I know they are not the same. Focus is to figure out how to incorporate a char category variable along with "_DATE" into a proc transposed column header of date values by pat_id. Not if examples used for illustration purposes are consistent. You get the idea.

At this point, I cannot say with 100% certainty where this is going. It is a bunch of Enterprise Guide 7.13 legacy code made by someone who know how to drag-and-drop components only.

My task is to try to refactor the 10,000 lines of incomprehensible, undocumented, uncommented code into SAS Studio code that can be loaded back into the Warehouse on an automated basis.

The proc transpose code module show above, (that one that works) replaced 80 lines of code.

ChrisNZ · Posted 07-09-2021 01:34 AM

The message is clear

ERROR: The ID value "'02JUL2020'n" occurs twice in the same BY group.

You can't have 2 columns named the same (i.e named with the same date in this case).

So the data is not ready to be transposed by ID and DATE as these are not unique in the data.

High-Performance SAS Coding - Third Edition

Kurt_Bremser · Posted 07-09-2021 02:39 AM

The code you posted CANNOT throw the messages, as it does not contain an ID statement. Please re-post your question in a consistent manner, so that data, expected result and code match the log.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 07-09-2021 03:45 AM

@Jumboshrimps wrote:

The two examples are MASKED confidential patient data. The do NOT align. For illustration purposes only. I know they are not the same. Focus is to figure out how to incorporate a char category variable along with "_DATE" into a proc transposed column header of date values by pat_id. Not if examples used for illustration purposes are consistent. You get the idea.

At this point, I cannot say with 100% certainty where this is going. It is a bunch of Enterprise Guide 7.13 legacy code made by someone who know how to drag-and-drop components only.

My task is to try to refactor the 10,000 lines of incomprehensible, undocumented, uncommented code into SAS Studio code that can be loaded back into the Warehouse on an automated basis.

The proc transpose code module show above, (that one that works) replaced 80 lines of code.

I submit that "illustration purposes" that do not "match" obfuscate the desired goal and do not "illustrate" the desired process.

If the purpose is to get a column name like ABC_DATE then you have made no attempt to do so. Extract the part of the category you need, append "_DATE" to that and use that value as an ID.

Question: Does the same category for a given patient have the Start_date duplicated? If so, you need to show exactly what you expect to occur with one of those. "Illustrative data" is sufficient but the input must allow creating the output.

Below is an example of 1) how to provide data in form we can use that 2) adds a variable for a name and creates something sort of like what you are requesting.

If you have duplicates of start_date and category then I suggest adding NODUPEKEY to the Proc Sort. If you have a valid reason to duplicate the start_date category then Proc Transpose is not going to get you there. You would have to add a second variable with the sequence of occurence, use and ID statement like ID varname seqnum and accept variables with names like ABC_date1 ABC_date2.

data have;
  infile datalines dlm='|';
  input 	PAT_ID :$5.	CATEGORY :$25. 	START_DATE :mmddyy8.	MED;
  VarName = cats(scan(category,1,'_'),"_DATE");
  format start_date mmddyy10.;
datalines;
211|ACE|9/11/20|1
423|ARB|9/20/18|1
423|POT_SPAR_DIURETIC|9/20/18|1
634|ACE|1/13/21|1
846|LOOP_DIURETIC|12/30/20|1
846|POT_SPAR_DIURETIC|12/30/20|1
1057|ACE|5/8/20|1
1057|POT_SPAR_DIURETIC|8/22/19|1
1269|BETA_BLOCKER|2/16/21|1
1480|ACE|3/1/21|1
1480|BETA_BLOCKER|2/5/21|1
1480|THIAZIDE_DIURETIC|3/1/21|1
1691|THIAZIDE_DIURETIC|9/7/20|1
1903|CCB|2/16/21|1
1903|THIAZIDE_DIURETIC|1/19/21|1
1903|BETA_BLOCKER|1/7/21|1
;

proc sort data=have;
   by pat_id start_date category;
run;

proc transpose data=have
     out=want (drop=_name_ start_date);
   by pat_id start_date;
   var start_date;
   id varname;
run;

PaigeMiller · Posted 07-09-2021 06:10 AM

@Jumboshrimps wrote:

At this point, I cannot say with 100% certainty where this is going. It is a bunch of Enterprise Guide 7.13 legacy code made by someone who know how to drag-and-drop components only.

My task is to try to refactor the 10,000 lines of incomprehensible, undocumented, uncommented code into SAS Studio code that can be loaded back into the Warehouse on an automated basis.

The proc transpose code module show above, (that one that works) replaced 80 lines of code.

You must have some idea about how this output is used. You inherited something that produces a report, or an analysis, or an output. You did not inherit something that does complicated transposing and then stops there and does nothing with the transposed data.

If you don't know how this is being used, then I suggest that's the first step you need to tackle.

And in any event, I will stick with my opinion that putting dates into variable names is a poor practice, and much easier methods of dealing with this data are available if you do not transpose.

--
Paige Miller

PROC TRANSPOSE with dates as column heades

Re: PROC TRANSPOSE with dates as column heades

Re: PROC TRANSPOSE with dates as column heades

Re: PROC TRANSPOSE with dates as column heades

Re: PROC TRANSPOSE with dates as column heades

Re: PROC TRANSPOSE with dates as column heades

Re: PROC TRANSPOSE with dates as column heades

Re: PROC TRANSPOSE with dates as column heades

SAS Innovate 2025: Call for Content

Classroom Training Available!