BookmarkSubscribeRSS Feed
Jumboshrimps
Obsidian | Level 7

I've seen a couple responses to proc transpose with dates on this site, but none of the solutions worked as I have more than one date per unique ID (pat_id)  My first proc transpose from the code below worked perfectly, from the link

https://sasexamplecode.com/complete-guide-to-proc-transpose-in-sas/

producing the screen shot below:

However, each category (ID), appears only ONE time per unique ID (pat_id) -  the BY variable.

Proc transpose performs perfectly, after pre-sorting the data set and nodupkey.

PROC TRANSPOSE DATA = HTN_MEDS_START_NU_2 out = htn_meDS_START_NU3
LABEL=VAR_LABEL ;
BY pat_id NOTSORTED;
VAR MED ;
ID CATEGORY;

PAT_ID THIAZIDE LOOP CCB ACE ARB ALPHA_BLKR BETA_BLKR
211 1            
423   1 1        
423 1            
634     1 1      
846 1   1        
846         1    
1057 1       1 1  
1057           1  
1269           1 1
1480         1    
1480           1 1
1480 1            
1691 1       1   1
1903         1    
1903       1 1    
1903 1            
2114 1         1  
2326 1            
2326 1            
2537           1  
2748 1       1    
2748 1            
2748 1            
2960             1
3171 1            
3171 1            
3383         1    
3594           1  

 

Each column header is a category, if the pat_id does not have that category (there are only ten categories) then the result is null, else it's a 1.    Each pat_id has one row with a complete set of data, no duplicate pat_id's.

 

Now I want to do the same thing with the dates associated with those categories -  each observation has a category and one date - only three vars - pat_id, category, start_date.      "MED" is set to 1 for every record, so Proc Transpose has a var to calculate.  Cheesy, I know.                                              

HOWEVER a unique pat_id might have/will have multiple unique categories,                                                                             BUT two of those categories might have the same start_date for the same pat_id.                                                                                         

Example below.

 

ROW PAT_ID CATEGORY START_DATE MED
1 211 ACE 9/11/20 1
2 423 ARB 9/20/18 1
3 423 POT_SPAR_DIURETIC 9/20/18 1
4 634 ACE 1/13/21 1
5 846 LOOP_DIURETIC 12/30/20 1
6 846 POT_SPAR_DIURETIC 12/30/20 1
7 1057 ACE 5/8/20 1
8 1057 POT_SPAR_DIURETIC 8/22/19 1
9 1269 BETA_BLOCKER 2/16/21 1
10 1480 ACE 3/1/21 1
11 1480 BETA_BLOCKER 2/5/21 1
12 1480 THIAZIDE_DIURETIC 3/1/21 1
13 1691 THIAZIDE_DIURETIC 9/7/20 1
14 1903 CCB 2/16/21 1
15 1903 THIAZIDE_DIURETIC 1/19/21 1
16 1903 BETA_BLOCKER 1/7/21 1

 

We see that rows 10-12 have the same pat_id, with three different, unique categories.  So far so good. But two of the categories have the same START_DATE, rows 10 and 12.

Code below throws the following errors, and ends with the final message:

"Too many bad BY groups".

PROC TRANSPOSE DATA = HTN_MEDS_START_NU_2 out = htn_meDS_START_NU3_DT
LABEL=VAR_LABEL ;
BY pat_id ;
VAR start_date ;

RUN;

 NOTE: The above message was for the following BY group:
PAT_ID=xxxxxxxxx
ERROR: The ID value "'02JUL2020'n" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
PAT_ID=xxxxxxxxx
ERROR: The ID value "'13OCT2020'n" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
PAT_ID=xxxxxxxxx
ERROR: The ID value "'20NOV2020'n" occurs twice in the same BY group.
ERROR: The ID value "'20NOV2020'n" occurs twice in the same BY group.
ERROR: Too many bad BY groups.
NOTE: The SAS System stopped 

 

This is what I expected:

 

PAT_ID THIAZIDE_DATE LOOP_DATE CCB_DATE ACE_DATE
211       09/11/2020
423        
423       01/13/2021
634   12/30/2020    
846       05/08/2020
846        
1057 02/17/2021     03/01/2021
1057 09/07/2020      
1269 01/19/2021   03/07/2021  
1480        
1480 09/25/2020   11/02/2020  
1480       09/30/2020
1691     12/28/2020 12/28/2020
1903        
1903   10/12/2020    
1903       09/10/2020
2114 12/08/2020     02/02/2021
2326       08/20/2019
2326       07/09/2019
2537 01/09/2019      
2748       02/22/2021
2748       10/01/2020
2748       06/30/2019
2960     03/20/2019  
3171       02/20/2021
3171       06/01/2020
3383       02/22/2021

 

The category name is the prefix for the column, and the string "_DATE".  If the pat_id has an occurrence of "ALPHA_BLKR"

in the original data set, there would be a 1 under ALPHA_BLKR for that pat_id record, and the date column associated with that pat_id would be "ALPHA_BLKR_DATE", and the start_date.  

So the question is - at least as I understand - how to place the category name as the prefix in the dates column header, along with "_DATE".  AND how to run a PROC transpose on data that has multiple, but identical values (start_date) spread across one unique ID.

I've done prefixes with PROC TRANPOSE before, changing "COL1,  COL2, COL3, etc to "Diagnosis1, Diagnosis2, Diagnosis3", but that's a fixed string just incrementing plus 1 for each column  - not pulling the contents (Category) of the record as part of the date column header.

 

THANX.

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Ignoring the difficulty of doing this transformation for a second, it still seems to me that whatever you want to do as the next step with this data would be much easier programmed on un-transformed data. It is generally considered to be a bad practice to put data (in this case, a calendar date) into a variable name, and it makes the subsequent programming much more difficult.

 

So, where is this headed? What are you going to do with this data after you transform it (or even if you leave it untransformed)?

 

 

--
Paige Miller
ballardw
Super User

@Jumboshrimps wrote:

I've seen a couple responses to proc transpose with dates on this site, but none of the solutions worked as I have more than one date per unique ID (pat_id)  My first proc transpose from the code below worked perfectly, from the link

https://sasexamplecode.com/complete-guide-to-proc-transpose-in-sas/

producing the screen shot below:

However, each category (ID), appears only ONE time per unique ID (pat_id) -  the BY variable.

Proc transpose performs perfectly, after pre-sorting the data set and nodupkey.

PROC TRANSPOSE DATA = HTN_MEDS_START_NU_2 out = htn_meDS_START_NU3
LABEL=VAR_LABEL ;
BY pat_id NOTSORTED;
VAR MED ;
ID CATEGORY;

PAT_ID THIAZIDE LOOP CCB ACE ARB ALPHA_BLKR BETA_BLKR
211 1            
423   1 1        
423 1            
634     1 1      
846 1   1        
846         1    
1057 1       1 1  
1057           1  
1269           1 1
1480         1    
1480           1 1
1480 1            
1691 1       1   1
1903         1    
1903       1 1    
1903 1            
2114 1         1  
2326 1            
2326 1            
2537           1  
2748 1       1    
2748 1            
2748 1            
2960             1
3171 1            
3171 1            
3383         1    
3594           1  

 

Each column header is a category, if the pat_id does not have that category (there are only ten categories) then the result is null, else it's a 1.    Each pat_id has one row with a complete set of data, no duplicate pat_id's.

 

Now I want to do the same thing with the dates associated with those categories -  each observation has a category and one date - only three vars - pat_id, category, start_date.      "MED" is set to 1 for every record, so Proc Transpose has a var to calculate.  Cheesy, I know.                                              

HOWEVER a unique pat_id might have/will have multiple unique categories,                                                                             BUT two of those categories might have the same start_date for the same pat_id.                                                                                         

Example below.

 

ROW PAT_ID CATEGORY START_DATE MED
1 211 ACE 9/11/20 1
2 423 ARB 9/20/18 1
3 423 POT_SPAR_DIURETIC 9/20/18 1
4 634 ACE 1/13/21 1
5 846 LOOP_DIURETIC 12/30/20 1
6 846 POT_SPAR_DIURETIC 12/30/20 1
7 1057 ACE 5/8/20 1
8 1057 POT_SPAR_DIURETIC 8/22/19 1
9 1269 BETA_BLOCKER 2/16/21 1
10 1480 ACE 3/1/21 1
11 1480 BETA_BLOCKER 2/5/21 1
12 1480 THIAZIDE_DIURETIC 3/1/21 1
13 1691 THIAZIDE_DIURETIC 9/7/20 1
14 1903 CCB 2/16/21 1
15 1903 THIAZIDE_DIURETIC 1/19/21 1
16 1903 BETA_BLOCKER 1/7/21 1

 

We see that rows 10-12 have the same pat_id, with three different, unique categories.  So far so good. But two of the categories have the same START_DATE, rows 10 and 12.

Code below throws the following errors, and ends with the final message:

"Too many bad BY groups".

PROC TRANSPOSE DATA = HTN_MEDS_START_NU_2 out = htn_meDS_START_NU3_DT
LABEL=VAR_LABEL ;
BY pat_id ;
VAR start_date ;

RUN;

 NOTE: The above message was for the following BY group:
PAT_ID=xxxxxxxxx
ERROR: The ID value "'02JUL2020'n" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
PAT_ID=xxxxxxxxx
ERROR: The ID value "'13OCT2020'n" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
PAT_ID=xxxxxxxxx
ERROR: The ID value "'20NOV2020'n" occurs twice in the same BY group.
ERROR: The ID value "'20NOV2020'n" occurs twice in the same BY group.
ERROR: Too many bad BY groups.
NOTE: The SAS System stopped 

 

This is what I expected:

 

PAT_ID THIAZIDE_DATE LOOP_DATE CCB_DATE ACE_DATE
211       09/11/2020
423        
423       01/13/2021
634   12/30/2020    
846       05/08/2020
846        
1057 02/17/2021     03/01/2021
1057 09/07/2020      
1269 01/19/2021   03/07/2021  
1480        
1480 09/25/2020   11/02/2020  
1480       09/30/2020
1691     12/28/2020 12/28/2020
1903        
1903   10/12/2020    
1903       09/10/2020
2114 12/08/2020     02/02/2021
2326       08/20/2019
2326       07/09/2019
2537 01/09/2019      
2748       02/22/2021
2748       10/01/2020
2748       06/30/2019
2960     03/20/2019  
3171       02/20/2021
3171       06/01/2020
3383       02/22/2021

 

The category name is the prefix for the column, and the string "_DATE".  If the pat_id has an occurrence of "ALPHA_BLKR"

in the original data set, there would be a 1 under ALPHA_BLKR for that pat_id record, and the date column associated with that pat_id would be "ALPHA_BLKR_DATE", and the start_date.  

So the question is - at least as I understand - how to place the category name as the prefix in the dates column header, along with "_DATE".  AND how to run a PROC transpose on data that has multiple, but identical values (start_date) spread across one unique ID.

I've done prefixes with PROC TRANPOSE before, changing "COL1,  COL2, COL3, etc to "Diagnosis1, Diagnosis2, Diagnosis3", but that's a fixed string just incrementing plus 1 for each column  - not pulling the contents (Category) of the record as part of the date column header.

 

THANX.


Please make sure that your "expected" bears at least some resemblance to the input data.

Your "example" data for pat_id 1480 shows dates of 3/1/21, 3/1/21 and 2/5/21. Your "expected" data for pat_id 1480 shows dates of  09/25/2020, 11/02/2020 and 09/30/2020. Please describe exactly how you get an "expected" result of those dates from the "example" data.

 

Also I strongly suspect the code shown does not create the warning shown. The warning relates to an ID variable and would typically appear when there is an ID statement with a variable that has repeats. Your code shows no ID statement.

 

I have to say this output probably belongs in a report not transposed data.

 

 

Jumboshrimps
Obsidian | Level 7

The two examples are MASKED confidential patient data.  The do NOT align.  For illustration purposes only. I know they are not the same.  Focus is to figure out how to incorporate a char category variable along with  "_DATE" into a proc transposed column header of date values by pat_id.  Not if examples used for illustration purposes are consistent.  You get the idea.

At this point, I cannot say with 100% certainty where this is going.  It is a bunch of Enterprise Guide 7.13 legacy code made by someone who know how to drag-and-drop components only. 

My task is to try to refactor the 10,000 lines of incomprehensible, undocumented, uncommented code into SAS Studio code that can be loaded back into the Warehouse on an automated basis.  

The  proc transpose code module show above, (that one that works) replaced 80 lines of code.  

ChrisNZ
Tourmaline | Level 20

The message is clear

ERROR: The ID value "'02JUL2020'n" occurs twice in the same BY group.

You can't have 2 columns named the same (i.e named with the same date in this case). 

So the data is not ready to be transposed by ID and DATE as these are not unique in the data.

 

Kurt_Bremser
Super User

The code you posted CANNOT throw the messages, as it does not contain an ID statement. Please re-post your question in a consistent manner, so that data, expected result and code match the log.

ballardw
Super User

@Jumboshrimps wrote:

The two examples are MASKED confidential patient data.  The do NOT align.  For illustration purposes only. I know they are not the same.  Focus is to figure out how to incorporate a char category variable along with  "_DATE" into a proc transposed column header of date values by pat_id.  Not if examples used for illustration purposes are consistent.  You get the idea.

At this point, I cannot say with 100% certainty where this is going.  It is a bunch of Enterprise Guide 7.13 legacy code made by someone who know how to drag-and-drop components only. 

My task is to try to refactor the 10,000 lines of incomprehensible, undocumented, uncommented code into SAS Studio code that can be loaded back into the Warehouse on an automated basis.  

The  proc transpose code module show above, (that one that works) replaced 80 lines of code.  


I submit that "illustration purposes" that do not "match" obfuscate the desired goal and do not "illustrate" the desired process.

 

If the purpose is to get a column name like ABC_DATE then you have made no attempt to do so. Extract the part of the category  you need, append "_DATE" to that and use that value as an ID.

Question: Does the same category for a given patient have the Start_date duplicated? If so, you need to show exactly what you expect to occur with one of those. "Illustrative data" is sufficient but the input must allow creating the output.

 

Below is an example of 1) how to provide data in form we can use that 2) adds a variable for a name and creates something sort of like what you are requesting.

 

If you have duplicates of start_date and category then I suggest adding NODUPEKEY to the Proc Sort. If you have a valid reason to duplicate the start_date category then Proc Transpose is not going to get you there. You would have to add a second variable with the sequence of occurence, use and ID statement like ID varname seqnum and accept variables with names like ABC_date1 ABC_date2.

data have;
  infile datalines dlm='|';
  input 	PAT_ID :$5.	CATEGORY :$25. 	START_DATE :mmddyy8.	MED;
  VarName = cats(scan(category,1,'_'),"_DATE");
  format start_date mmddyy10.;
datalines;
211|ACE|9/11/20|1
423|ARB|9/20/18|1
423|POT_SPAR_DIURETIC|9/20/18|1
634|ACE|1/13/21|1
846|LOOP_DIURETIC|12/30/20|1
846|POT_SPAR_DIURETIC|12/30/20|1
1057|ACE|5/8/20|1
1057|POT_SPAR_DIURETIC|8/22/19|1
1269|BETA_BLOCKER|2/16/21|1
1480|ACE|3/1/21|1
1480|BETA_BLOCKER|2/5/21|1
1480|THIAZIDE_DIURETIC|3/1/21|1
1691|THIAZIDE_DIURETIC|9/7/20|1
1903|CCB|2/16/21|1
1903|THIAZIDE_DIURETIC|1/19/21|1
1903|BETA_BLOCKER|1/7/21|1
;

proc sort data=have;
   by pat_id start_date category;
run;

proc transpose data=have
     out=want (drop=_name_ start_date);
   by pat_id start_date;
   var start_date;
   id varname;
run;

 

 

 

 

PaigeMiller
Diamond | Level 26

@Jumboshrimps wrote:

 

At this point, I cannot say with 100% certainty where this is going.  It is a bunch of Enterprise Guide 7.13 legacy code made by someone who know how to drag-and-drop components only. 

My task is to try to refactor the 10,000 lines of incomprehensible, undocumented, uncommented code into SAS Studio code that can be loaded back into the Warehouse on an automated basis.  

The  proc transpose code module show above, (that one that works) replaced 80 lines of code.  


You must have some idea about how this output is used. You inherited something that produces a report, or an analysis, or an output. You did not inherit something that does complicated transposing and then stops there and does nothing with the transposed data.

 

If you don't know how this is being used, then I suggest that's the first step you need to tackle.

 

And in any event, I will stick with my opinion that putting dates into variable names is a poor practice, and much easier methods of dealing with this data are available if you do not transpose.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1946 views
  • 0 likes
  • 5 in conversation