Solved: Re: PROC TRANSPOSE with multiple variables

greg6363 · Posted 12-13-2019 08:37 PM

I am attempting to combine multiple account records with multiple variables into one observation with the TRANSPOSE procedure but I keep getting the following error:

ERROR: The ID1 value "_77144" occurs twice in the same BY group.

ERROR: Too many bad BY groups.

The log generates repeated errors for the ID1 field.

Here is my code:

proc transpose
data=libname.system_data
out=libname.revised_system_data;
by ID1;
id ID2;
var ID3 Date1 Amount Date2 Code1 Date3 Action1 Indicator1 Indicator2; (these are the field values that need to be transposed)
run;

ID1 and ID2 have different values but will be the same within each observation.

Do I need to separate the variables with an individual transpose for each one or will a do loop or arrays work within this procedure?

Any assistance would be greatly appreciated. Thanks.

greg6363 · Posted 12-16-2019 04:56 PM

I added the var COL1 code to the second PROC TRANSPOSE statement and it produced the output in the layout that I need in order to facilitate reporting. I'm set. I'll look at that macro to automate the process. Thanks for the assistance.

View solution in original post

JeanDo · Posted 12-14-2019 04:59 AM

Hello,

Can you give an example of your input dataset, and what you are expecting as an output?

JD

Salami · Posted 06-13-2021 08:36 PM

Given below in Data Set A are the initial LDL cholesterol level of each individual before receiving the drug or the placebo and the LDL reading 3 months after being placed on the drug/placebo. The patients are identified by a three digit code. After the patient code the LDL level of the patient is given, followed by a code that identifies if the LDL reading is the pre-drug/placebo reading (PRE) at zero months or the reading taken 3 months later (POST). Please note that two of the patients did not show up for the post drug/placebo testing.

DATA SET A

101 130 PRE

101 125 POST

102 180 PRE

102 120 POST

103 165 PRE

103 115 POST

104 145 PRE

104 150 POST

105 190 PRE

105 180 POST

106 125 PRE

106 70 POST

107 155 PRE

107 145 POST

108 175 PRE

108 110 POST

109 145 PRE

109 90 POST

110 120 PRE

110 125 POST

111 155 PRE

112 165 PRE

Below you will find the information giving whether the patient receive the drug (D) or the Placebo (P).

DATA SET B

101 P

102 D

103 D

104 P

105 P

106 D

107 P

108 D

109 D

110 P

111 P

112 D

Using only the SAS data sets you created in Part (a), create a data set that contains individual ID, before and after LDL levels, percentage reduction in the LDL level, and the whether each individual belongs to the Drug group or the Placebo group . Include SAS commands to print out this data set. No variables other than ID, Drug/placebo status, LDL level at zero months (PRE level), LDL level at 3 months (POST level), and Percentage Reduction should be included in the data set.

ed_sas_member · Posted 12-14-2019 06:10 AM

Hi @greg6363

It seems that your input data have more than one record with the same couple of values for ID1 and ID2.

That's why SAS in unable to transpose properly the data.

Is it something which is considered as normal in your input data? In this case, how do you what result do you expect when there are 2 observations with the same ID1 and ID2?

Best,

Tom · Posted 12-14-2019 02:31 PM

Please copy the lines from the SAS log and paste into your message. Make sure to use the Insert Code button, looks like {i}, so that the formatting is preserved.

Also post example data and desired output. Preferable as code that can be run to recreate the datasets.

It looks like you are trying to do this:

data have;
  input id1 id2 $ var1 var2 ;
cards;
1 new_name 10 20
2 new_name 30 40 
;
proc transpose data=have out=want ;
  by id1 ;
  id id2 ;
  var var1-var2;
run;

To produce this:

Obs    id1    _NAME_    new_name

 1      1      var1        10
 2      1      var2        20
 3      2      var1        30
 4      2      var2        40

Now if you have multiple observations per by group (ID1 values) then each one needs to have a distinct value of ID2 so that they will become a new variable in the output. If not you have duplicates like this:

data have;
  input id1 id2 $ var1 var2 ;
cards;
1 new_name 10 20
2 new_name 30 40 
2 new_name 50 60 
;

Then you get this error message:

112   proc transpose data=have out=want ;
113     by id1 ;
114     id id2 ;
115     var var1-var2;
116   run;

ERROR: The ID value "new_name" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
      id1=2
WARNING: 1 BY groups omitted due to earlier errors.
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 2 observations and 3 variables.

You can either modify ID2 (or add more variables to the ID statement) so that each row in each by group as a unique name. Or just remove the ID statement and let SAS make up the names.

Obs    id1    _NAME_    COL1    COL2

 1      1      var1      10       .
 2      1      var2      20       .
 3      2      var1      30      50
 4      2      var2      40      60

You can use the PREFIX option on the PROC statement to tell to use something other than COL as the basis for the new names.

Or you could add more variables to the BY statement so that each observation is uniquely identified. Then each observations will produce values for only one variable.

greg6363 · Posted 12-16-2019 11:08 AM

Let me give an example of the data set and the expected layout of the output:

ID1 ID2 ID3 Date1 Amount Date2 Code1 Date3 Action1 Indicator1 Indicator2

6666 7777 LP219 7/20/19 $100 8/1/19 SDA 8/4/19 007 1 0

6666 7777 SN015 7/21/19 $200 8/4/19 BND 8/15/19 192 1 0

6666 7777 BP123 7/24/19 $250 8/12/19 COS 8/18/19 308 1 0

6666 7777 CD954 7/25/19 $150 8/15/19 DTE 8/22/19 482 1 0

6666 7777 FF773 7/28/19 $125 8/24/19 RUI 8/28/19 732 1 0

The ID1 and ID2 fields are the same for each observation which is why I included both fields in the by statement. All the other variables have different values which I want to transpose onto one observation line.

The output (with the respective variables) after the transpose should look like this layout:

ID1 ID2

ID3_1 Date1_1 Amount_1 Date2_1 Code1_1 Date3_1 Action1_1 Indicator1_1 Indicator2_1

ID3_2 Date1_2 Amount_2 Date2_2 Code1_2 Date3_2 Action1_2 Indicator1_2 Indicator2_2

ID3_3 Date1_3 Amount_3 Date2_3 Code1_3 Date3_3 Action1_3 Indicator1_3 Indicator2_3

ID3_4 Date1_3 Amount_4 Date2_4 Code1_4 Date3_4 Action1_4 Indicator1_4 Indicator2_4

ID3_5 Date1_3 Amount_5 Date2_5 Code1_5 Date3_5 Action1_5 Indicator1_5 Indicator2_5

So the unique record starts at the ID3 field which is how I want the record to be arranged in the output.

I'm trying to see if I can create an array that will give me this particular output layout.

Any feedback is greatly appreciated. Thanks.

JeanDo · Posted 12-16-2019 11:54 AM

Hi,

I don't get why do you want ID1 and ID2 in the first row, but you can't do that with a single proc transpose due to duplicates. You can try out the following code:

/* We transpose the variables */
proc transpose data=have out=want(drop=_name_) ;
   var ID3 Date1 Amount Date2 Code1 Date3 Action1  Indicator1  Indicator2;
 run;

/* We copy ID1 and ID2 in the first line, twofirst variables of the new dataset */
 data wanted (drop=id1 id2);
 set have (keep=ID1 ID2 obs=1) want;
 if _n_=1 then do;
	col1=ID1;
	col2=ID2;
	output;
end;
else output;
run;

Regards,

greg6363 · Posted 12-16-2019 01:48 PM

Unfortunately, this code creates one long string and doesn't separate by ID1 and ID2.

Tom · Posted 12-16-2019 12:15 PM

So you need to do a double transpose. But first you need to add a variable so that each existing rows can be uniquely identified.

data have ;
 input ID1 ID2 ID3 $ Date1 :mmddyy. Amount :comma. Date2 :mmddyy. Code1 $ Date3 :mmddyy. Action1 Indicator1 Indicator2 ;
cards;
6666 7777 LP219 7/20/19 $100 8/1/19 SDA 8/4/19 007 1 0
6666 7777 SN015 7/21/19 $200 8/4/19 BND 8/15/19 192 1 0
6666 7777 BP123 7/24/19 $250 8/12/19 COS 8/18/19 308 1 0
6666 7777 CD954 7/25/19 $150 8/15/19 DTE 8/22/19 482 1 0
6666 7777 FF773 7/28/19 $125 8/24/19 RUI 8/28/19 732 1 0
;

data step1;
 set have;
 by id1 id2;
 if first.id2 then row=0;
 row+1;
run;

So transpose it once to get it into one tall skinny table.

proc transpose data=step1 out=step2;
  by id1 id2 row ;
run;

Then transpose it again to get it into the wide-wide table format you requested. You can add a SORT step to change the order that the variables get created if you want. Note that you will need to add a FORMAT statement to attach any needed formats (like for your date varaibles).

proc sort data=step2;
  by id1 id2 _name_ row;
run;

proc transpose data=step2 out=want(drop=_name_) delim=_;
  by id1 id2 ;
  id _name_ row ;
  format date: date9. ;
run;

Lets print the example to see what it looks like:

170   proc print data=want width=min;
171     id id1 id2 ;
172   run;

NOTE: There were 1 observations read from the data set WORK.WANT.

          Action1_ Action1_ Action1_ Action1_ Action1_
 ID1  ID2     1        2        3        4        5    Amount_1 Amount_2 Amount_3 Amount_4 Amount_5

6666 7777     7       192      308      482      732      100      200      250      150      125


 ID1  ID2  Date1_1   Date1_2   Date1_3   Date1_4   Date1_5   Date2_1   Date2_2   Date2_3   Date2_4

6666 7777 20JUL2019 21JUL2019 24JUL2019 25JUL2019 28JUL2019 01AUG2019 04AUG2019 12AUG2019 15AUG2019

                                                                      Indicator1_ Indicator1_ Indicator1_
 ID1  ID2  Date2_5   Date3_1   Date3_2   Date3_3   Date3_4   Date3_5       1           2           3

6666 7777 24AUG2019 04AUG2019 15AUG2019 18AUG2019 22AUG2019 28AUG2019      1           1           1

          Indicator1_  Indicator1_  Indicator2_  Indicator2_  Indicator2_  Indicator2_  Indicator2_
 ID1  ID2      4            5            1            2            3            4            5

6666 7777      1            1            0            0            0            0            0

greg6363 · Posted 12-16-2019 01:08 PM

The code does create a single observation but when I generate the output, I still have four fields missing (ID3, Date2, Code1 and Date3). Do I need to perform one more transpose for these variables? Please advise. Thanks.

Tom · Posted 12-16-2019 01:16 PM

@greg6363 wrote:

No, no. You're splitting the variables up. I want to transpose the entire string from each observation while keeping it intact.

That does not make any sense. What string are you talking about?

Tom · Posted 12-16-2019 01:46 PM

@greg6363 wrote:

The code does create a single observation but when I generate the output, I still have four fields missing (ID3, Date2, Code1 and Date3). Do I need to perform one more transpose for these variables? Please advise. Thanks.

Proc transpose without a VAR statement will only work on numeric variables. Are the missing variables character?

You might need to replicate the process for the numeric and character variables separately in the first PROC TRANSPOSE. To get just the character variables add a var statement like:

VAR _character_;

Depending on whether ID1 and ID2 are character or numeric you might need to filter their value out at some point from the transposed data.

There is also some useful macros that have been posted that make this process easier.

Check with @art297 .

greg6363 · Posted 12-16-2019 02:08 PM

Yes, the other fields are character formatted so let me make the adjustment on my end and let you know the outcome. Thanks.

greg6363 · Posted 12-16-2019 02:33 PM

Could I go back to the data set and convert all the fields (even the dates) into text fields and use the var _character_ statement? Let me know. Thanks.

Tom · Posted 12-16-2019 02:37 PM

PROC TRANSPOSE will convert them. Try using VAR _ALL_ ;

SAS Innovate 2025: Call for Content

Classroom Training Available!