- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am attempting to combine multiple account records with multiple variables into one observation with the TRANSPOSE procedure but I keep getting the following error:
ERROR: The ID1 value "_77144" occurs twice in the same BY group.
ERROR: Too many bad BY groups.
The log generates repeated errors for the ID1 field.
Here is my code:
proc transpose
data=libname.system_data
out=libname.revised_system_data;
by ID1;
id ID2;
var ID3 Date1 Amount Date2 Code1 Date3 Action1 Indicator1 Indicator2; (these are the field values that need to be transposed)
run;
ID1 and ID2 have different values but will be the same within each observation.
Do I need to separate the variables with an individual transpose for each one or will a do loop or arrays work within this procedure?
Any assistance would be greatly appreciated. Thanks.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I added the var COL1 code to the second PROC TRANSPOSE statement and it produced the output in the layout that I need in order to facilitate reporting. I'm set. I'll look at that macro to automate the process. Thanks for the assistance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Can you give an example of your input dataset, and what you are expecting as an output?
JD
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Given below in Data Set A are the initial LDL cholesterol level of each individual before receiving the drug or the placebo and the LDL reading 3 months after being placed on the drug/placebo. The patients are identified by a three digit code. After the patient code the LDL level of the patient is given, followed by a code that identifies if the LDL reading is the pre-drug/placebo reading (PRE) at zero months or the reading taken 3 months later (POST). Please note that two of the patients did not show up for the post drug/placebo testing.
DATA SET A
101 130 PRE
101 125 POST
102 180 PRE
102 120 POST
103 165 PRE
103 115 POST
104 145 PRE
104 150 POST
105 190 PRE
105 180 POST
106 125 PRE
106 70 POST
107 155 PRE
107 145 POST
108 175 PRE
108 110 POST
109 145 PRE
109 90 POST
110 120 PRE
110 125 POST
111 155 PRE
112 165 PRE
Below you will find the information giving whether the patient receive the drug (D) or the Placebo (P).
DATA SET B
101 P
102 D
103 D
104 P
105 P
106 D
107 P
108 D
109 D
110 P
111 P
112 D
- Using only the SAS data sets you created in Part (a), create a data set that contains individual ID, before and after LDL levels, percentage reduction in the LDL level, and the whether each individual belongs to the Drug group or the Placebo group . Include SAS commands to print out this data set. No variables other than ID, Drug/placebo status, LDL level at zero months (PRE level), LDL level at 3 months (POST level), and Percentage Reduction should be included in the data set.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @greg6363
It seems that your input data have more than one record with the same couple of values for ID1 and ID2.
That's why SAS in unable to transpose properly the data.
Is it something which is considered as normal in your input data? In this case, how do you what result do you expect when there are 2 observations with the same ID1 and ID2?
Best,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please copy the lines from the SAS log and paste into your message. Make sure to use the Insert Code button, looks like {i}, so that the formatting is preserved.
Also post example data and desired output. Preferable as code that can be run to recreate the datasets.
It looks like you are trying to do this:
data have;
input id1 id2 $ var1 var2 ;
cards;
1 new_name 10 20
2 new_name 30 40
;
proc transpose data=have out=want ;
by id1 ;
id id2 ;
var var1-var2;
run;
To produce this:
Obs id1 _NAME_ new_name 1 1 var1 10 2 1 var2 20 3 2 var1 30 4 2 var2 40
Now if you have multiple observations per by group (ID1 values) then each one needs to have a distinct value of ID2 so that they will become a new variable in the output. If not you have duplicates like this:
data have;
input id1 id2 $ var1 var2 ;
cards;
1 new_name 10 20
2 new_name 30 40
2 new_name 50 60
;
Then you get this error message:
112 proc transpose data=have out=want ; 113 by id1 ; 114 id id2 ; 115 var var1-var2; 116 run; ERROR: The ID value "new_name" occurs twice in the same BY group. NOTE: The above message was for the following BY group: id1=2 WARNING: 1 BY groups omitted due to earlier errors. NOTE: There were 3 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 2 observations and 3 variables.
You can either modify ID2 (or add more variables to the ID statement) so that each row in each by group as a unique name. Or just remove the ID statement and let SAS make up the names.
Obs id1 _NAME_ COL1 COL2 1 1 var1 10 . 2 1 var2 20 . 3 2 var1 30 50 4 2 var2 40 60
You can use the PREFIX option on the PROC statement to tell to use something other than COL as the basis for the new names.
Or you could add more variables to the BY statement so that each observation is uniquely identified. Then each observations will produce values for only one variable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Let me give an example of the data set and the expected layout of the output:
ID1 ID2 ID3 Date1 Amount Date2 Code1 Date3 Action1 Indicator1 Indicator2
6666 7777 LP219 7/20/19 $100 8/1/19 SDA 8/4/19 007 1 0
6666 7777 SN015 7/21/19 $200 8/4/19 BND 8/15/19 192 1 0
6666 7777 BP123 7/24/19 $250 8/12/19 COS 8/18/19 308 1 0
6666 7777 CD954 7/25/19 $150 8/15/19 DTE 8/22/19 482 1 0
6666 7777 FF773 7/28/19 $125 8/24/19 RUI 8/28/19 732 1 0
The ID1 and ID2 fields are the same for each observation which is why I included both fields in the by statement. All the other variables have different values which I want to transpose onto one observation line.
The output (with the respective variables) after the transpose should look like this layout:
ID1 ID2
ID3_1 Date1_1 Amount_1 Date2_1 Code1_1 Date3_1 Action1_1 Indicator1_1 Indicator2_1
ID3_2 Date1_2 Amount_2 Date2_2 Code1_2 Date3_2 Action1_2 Indicator1_2 Indicator2_2
ID3_3 Date1_3 Amount_3 Date2_3 Code1_3 Date3_3 Action1_3 Indicator1_3 Indicator2_3
ID3_4 Date1_3 Amount_4 Date2_4 Code1_4 Date3_4 Action1_4 Indicator1_4 Indicator2_4
ID3_5 Date1_3 Amount_5 Date2_5 Code1_5 Date3_5 Action1_5 Indicator1_5 Indicator2_5
So the unique record starts at the ID3 field which is how I want the record to be arranged in the output.
I'm trying to see if I can create an array that will give me this particular output layout.
Any feedback is greatly appreciated. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I don't get why do you want ID1 and ID2 in the first row, but you can't do that with a single proc transpose due to duplicates. You can try out the following code:
/* We transpose the variables */ proc transpose data=have out=want(drop=_name_) ; var ID3 Date1 Amount Date2 Code1 Date3 Action1 Indicator1 Indicator2; run; /* We copy ID1 and ID2 in the first line, twofirst variables of the new dataset */ data wanted (drop=id1 id2); set have (keep=ID1 ID2 obs=1) want; if _n_=1 then do; col1=ID1; col2=ID2; output; end; else output; run;
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately, this code creates one long string and doesn't separate by ID1 and ID2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So you need to do a double transpose. But first you need to add a variable so that each existing rows can be uniquely identified.
data have ;
input ID1 ID2 ID3 $ Date1 :mmddyy. Amount :comma. Date2 :mmddyy. Code1 $ Date3 :mmddyy. Action1 Indicator1 Indicator2 ;
cards;
6666 7777 LP219 7/20/19 $100 8/1/19 SDA 8/4/19 007 1 0
6666 7777 SN015 7/21/19 $200 8/4/19 BND 8/15/19 192 1 0
6666 7777 BP123 7/24/19 $250 8/12/19 COS 8/18/19 308 1 0
6666 7777 CD954 7/25/19 $150 8/15/19 DTE 8/22/19 482 1 0
6666 7777 FF773 7/28/19 $125 8/24/19 RUI 8/28/19 732 1 0
;
data step1;
set have;
by id1 id2;
if first.id2 then row=0;
row+1;
run;
So transpose it once to get it into one tall skinny table.
proc transpose data=step1 out=step2;
by id1 id2 row ;
run;
Then transpose it again to get it into the wide-wide table format you requested. You can add a SORT step to change the order that the variables get created if you want. Note that you will need to add a FORMAT statement to attach any needed formats (like for your date varaibles).
proc sort data=step2;
by id1 id2 _name_ row;
run;
proc transpose data=step2 out=want(drop=_name_) delim=_;
by id1 id2 ;
id _name_ row ;
format date: date9. ;
run;
Lets print the example to see what it looks like:
170 proc print data=want width=min; 171 id id1 id2 ; 172 run; NOTE: There were 1 observations read from the data set WORK.WANT.
Action1_ Action1_ Action1_ Action1_ Action1_ ID1 ID2 1 2 3 4 5 Amount_1 Amount_2 Amount_3 Amount_4 Amount_5 6666 7777 7 192 308 482 732 100 200 250 150 125 ID1 ID2 Date1_1 Date1_2 Date1_3 Date1_4 Date1_5 Date2_1 Date2_2 Date2_3 Date2_4 6666 7777 20JUL2019 21JUL2019 24JUL2019 25JUL2019 28JUL2019 01AUG2019 04AUG2019 12AUG2019 15AUG2019 Indicator1_ Indicator1_ Indicator1_ ID1 ID2 Date2_5 Date3_1 Date3_2 Date3_3 Date3_4 Date3_5 1 2 3 6666 7777 24AUG2019 04AUG2019 15AUG2019 18AUG2019 22AUG2019 28AUG2019 1 1 1 Indicator1_ Indicator1_ Indicator2_ Indicator2_ Indicator2_ Indicator2_ Indicator2_ ID1 ID2 4 5 1 2 3 4 5 6666 7777 1 1 0 0 0 0 0
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The code does create a single observation but when I generate the output, I still have four fields missing (ID3, Date2, Code1 and Date3). Do I need to perform one more transpose for these variables? Please advise. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@greg6363 wrote:
No, no. You're splitting the variables up. I want to transpose the entire string from each observation while keeping it intact.
That does not make any sense. What string are you talking about?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@greg6363 wrote:
The code does create a single observation but when I generate the output, I still have four fields missing (ID3, Date2, Code1 and Date3). Do I need to perform one more transpose for these variables? Please advise. Thanks.
Proc transpose without a VAR statement will only work on numeric variables. Are the missing variables character?
You might need to replicate the process for the numeric and character variables separately in the first PROC TRANSPOSE. To get just the character variables add a var statement like:
VAR _character_;
Depending on whether ID1 and ID2 are character or numeric you might need to filter their value out at some point from the transposed data.
There is also some useful macros that have been posted that make this process easier.
Check with @art297 .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, the other fields are character formatted so let me make the adjustment on my end and let you know the outcome. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Could I go back to the data set and convert all the fields (even the dates) into text fields and use the var _character_ statement? Let me know. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content