BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
greg6363
Obsidian | Level 7

I am attempting to combine multiple account records with multiple variables into one observation with the TRANSPOSE procedure but I keep getting the following error:

 

ERROR: The ID1 value "_77144" occurs twice in the same BY group.

ERROR: Too many bad BY groups.

 

The log generates repeated errors for the ID1 field.

 

Here is my code:

 

proc transpose
data=libname.system_data
out=libname.revised_system_data;
by ID1; 
id ID2;
var ID3 Date1 Amount Date2 Code1 Date3 Action1 Indicator1 Indicator2; (these are the field values that need to be transposed)
run;

 

ID1 and ID2 have different values but will be the same within each observation.  

 

Do I need to separate the variables with an individual transpose for each one or will a do loop or arrays work within this procedure?

 

Any assistance would be greatly appreciated.  Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
greg6363
Obsidian | Level 7

I added the var COL1 code to the second PROC TRANSPOSE statement and it produced the output in the layout that I need in order to facilitate reporting.  I'm set.  I'll look at that macro to automate the process.  Thanks for the assistance. 

View solution in original post

20 REPLIES 20
JeanDo
Obsidian | Level 7

Hello,

Can you give an example of your input dataset, and what you are expecting as an output?

 

JD

Salami
Calcite | Level 5

Given below in Data Set A are the initial LDL cholesterol level of each individual before receiving the drug or the placebo and the LDL reading 3 months after being placed on the drug/placebo. The patients are identified by a three digit code. After the patient code the LDL level of the patient is given, followed by a code that identifies if the LDL reading is the pre-drug/placebo reading (PRE) at zero months or the reading taken 3 months later (POST). Please note that two of the patients did not show up for the post drug/placebo testing.

 

DATA SET A

 

101  130  PRE  

101  125  POST

102  180  PRE

102  120  POST

103  165  PRE

103  115  POST

104  145  PRE

104  150  POST

105  190  PRE

105  180  POST

106  125  PRE

106   70   POST

107  155  PRE

107  145  POST

108  175  PRE

108  110  POST

109  145  PRE

109   90   POST

110  120  PRE

110  125  POST

111  155  PRE

112  165  PRE

 

 

 

 

 

 

 

 

 

 

Below you will find the information giving whether the patient receive the drug (D) or the Placebo (P).

 

DATA SET B

 

101   P

102   D

103   D

104   P

105   P

106   D

107   P

108   D

109   D

110   P

111   P

112   D

 

 

  • Using only the SAS data sets you created in Part (a), create a data set that contains individual ID, before and after LDL levels,  percentage reduction in the LDL level, and the whether each individual belongs to the Drug group or the Placebo group . Include SAS commands to print out this data set. No variables other than ID, Drug/placebo status, LDL level at zero months (PRE level), LDL level at 3 months (POST level), and Percentage Reduction should be included in the data set.
ed_sas_member
Meteorite | Level 14

Hi @greg6363 

 

It seems that your input data have more than one record with the same couple of values for ID1 and ID2.

That's why SAS in unable to transpose properly the data.

 

Is it something which is considered as normal in your input data? In this case, how do you what result do you expect when there are 2 observations with the same ID1 and ID2?

 

Best,

Tom
Super User Tom
Super User

Please copy the lines from the SAS log and paste into your message.  Make sure to use the Insert Code button, looks like {i}, so that the formatting is preserved.

 

Also post example data and desired output. Preferable as code that can be run to recreate the datasets.

 

It looks like you are trying to do this:

data have;
  input id1 id2 $ var1 var2 ;
cards;
1 new_name 10 20
2 new_name 30 40 
;
proc transpose data=have out=want ;
  by id1 ;
  id id2 ;
  var var1-var2;
run;

To produce this:

Obs    id1    _NAME_    new_name

 1      1      var1        10
 2      1      var2        20
 3      2      var1        30
 4      2      var2        40

Now if you have multiple observations per by group (ID1 values) then each one needs to have a distinct value of ID2 so that they will become a new variable in the output.  If not you have duplicates like this:

data have;
  input id1 id2 $ var1 var2 ;
cards;
1 new_name 10 20
2 new_name 30 40 
2 new_name 50 60 
;

Then you get this error message:

112   proc transpose data=have out=want ;
113     by id1 ;
114     id id2 ;
115     var var1-var2;
116   run;

ERROR: The ID value "new_name" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
      id1=2
WARNING: 1 BY groups omitted due to earlier errors.
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 2 observations and 3 variables.

You can either modify ID2 (or add more variables to the ID statement) so that each row in each by group as a unique name.  Or just remove the ID statement and let SAS make up the names.

Obs    id1    _NAME_    COL1    COL2

 1      1      var1      10       .
 2      1      var2      20       .
 3      2      var1      30      50
 4      2      var2      40      60

You can use the PREFIX option on the PROC statement to tell to use something other than COL as the basis for the new names.

 

Or you could add more variables to the BY statement so that each observation is uniquely identified.  Then each observations will produce values for only one variable.

greg6363
Obsidian | Level 7

Let me give an example of the data set and the expected layout of the output:

 

ID1       ID2    ID3       Date1    Amount  Date2  Code1  Date3  Action1   Indicator1  Indicator2

6666   7777  LP219  7/20/19   $100      8/1/19   SDA    8/4/19     007           1                 0

6666   7777  SN015  7/21/19   $200     8/4/19   BND    8/15/19   192           1                 0

6666   7777  BP123  7/24/19   $250     8/12/19  COS   8/18/19   308           1                 0

6666   7777  CD954  7/25/19   $150     8/15/19  DTE   8/22/19   482           1                 0

6666   7777  FF773   7/28/19   $125    8/24/19   RUI    8/28/19   732           1                 0

 

The ID1 and ID2 fields are the same for each observation which is why I included both fields in the by statement.  All the other variables have different values which I want to transpose onto one observation line.  

 

The output (with the respective variables) after the transpose should look like this layout:

 

ID1  ID2 

 

ID3_1  Date1_1  Amount_1  Date2_1  Code1_1  Date3_1  Action1_1  Indicator1_1  Indicator2_1

 

ID3_2  Date1_2  Amount_2  Date2_2  Code1_2  Date3_2  Action1_2  Indicator1_2  Indicator2_2

 

ID3_3  Date1_3  Amount_3  Date2_3  Code1_3  Date3_3  Action1_3  Indicator1_3  Indicator2_3

 

ID3_4  Date1_3  Amount_4  Date2_4  Code1_4  Date3_4  Action1_4  Indicator1_4  Indicator2_4

 

ID3_5  Date1_3  Amount_5  Date2_5  Code1_5  Date3_5  Action1_5  Indicator1_5  Indicator2_5

 

So the unique record starts at the ID3 field which is how I want the record to be arranged in the output.

I'm trying to see if I can create an array that will give me this particular output layout.

 

Any feedback is greatly appreciated.  Thanks.

JeanDo
Obsidian | Level 7

Hi, 

I don't get why do you want ID1 and ID2 in the first row, but you can't do that with a single proc transpose due to duplicates. You can try out the following code:

 

/* We transpose the variables */
proc transpose data=have out=want(drop=_name_) ;
   var ID3 Date1 Amount Date2 Code1 Date3 Action1  Indicator1  Indicator2;
 run;

/* We copy ID1 and ID2 in the first line, twofirst variables of the new dataset */
 data wanted (drop=id1 id2);
 set have (keep=ID1 ID2 obs=1) want;
 if _n_=1 then do;
	col1=ID1;
	col2=ID2;
	output;
end;
else output;
run;

Regards,

 

greg6363
Obsidian | Level 7

Unfortunately, this code creates one long string and doesn't separate by ID1 and ID2.

Tom
Super User Tom
Super User

So you need to do a double transpose. But first you need to add a variable so that each existing rows can be uniquely identified.

data have ;
 input ID1 ID2 ID3 $ Date1 :mmddyy. Amount :comma. Date2 :mmddyy. Code1 $ Date3 :mmddyy. Action1 Indicator1 Indicator2 ;
cards;
6666 7777 LP219 7/20/19 $100 8/1/19 SDA 8/4/19 007 1 0
6666 7777 SN015 7/21/19 $200 8/4/19 BND 8/15/19 192 1 0
6666 7777 BP123 7/24/19 $250 8/12/19 COS 8/18/19 308 1 0
6666 7777 CD954 7/25/19 $150 8/15/19 DTE 8/22/19 482 1 0
6666 7777 FF773 7/28/19 $125 8/24/19 RUI 8/28/19 732 1 0
;

data step1;
 set have;
 by id1 id2;
 if first.id2 then row=0;
 row+1;
run;

So transpose it once to get it into one tall skinny table.

proc transpose data=step1 out=step2;
  by id1 id2 row ;
run;

Then transpose it again to get it into the wide-wide table format you requested.  You can add a SORT step to change the order that the variables get created if you want.  Note that you will need to add a FORMAT statement to attach any needed formats (like for your date varaibles).

proc sort data=step2;
  by id1 id2 _name_ row;
run;

proc transpose data=step2 out=want(drop=_name_) delim=_;
  by id1 id2 ;
  id _name_ row ;
  format date: date9. ;
run;

Lets print the example to see what it looks like:

170   proc print data=want width=min;
171     id id1 id2 ;
172   run;

NOTE: There were 1 observations read from the data set WORK.WANT.
          Action1_ Action1_ Action1_ Action1_ Action1_
 ID1  ID2     1        2        3        4        5    Amount_1 Amount_2 Amount_3 Amount_4 Amount_5

6666 7777     7       192      308      482      732      100      200      250      150      125


 ID1  ID2  Date1_1   Date1_2   Date1_3   Date1_4   Date1_5   Date2_1   Date2_2   Date2_3   Date2_4

6666 7777 20JUL2019 21JUL2019 24JUL2019 25JUL2019 28JUL2019 01AUG2019 04AUG2019 12AUG2019 15AUG2019

                                                                      Indicator1_ Indicator1_ Indicator1_
 ID1  ID2  Date2_5   Date3_1   Date3_2   Date3_3   Date3_4   Date3_5       1           2           3

6666 7777 24AUG2019 04AUG2019 15AUG2019 18AUG2019 22AUG2019 28AUG2019      1           1           1

          Indicator1_  Indicator1_  Indicator2_  Indicator2_  Indicator2_  Indicator2_  Indicator2_
 ID1  ID2      4            5            1            2            3            4            5

6666 7777      1            1            0            0            0            0            0

 

greg6363
Obsidian | Level 7

The code does create a single observation but when I generate the output, I still have four fields missing (ID3, Date2, Code1 and Date3).  Do I need to perform one more transpose for these variables?  Please advise.  Thanks.

 

 

Tom
Super User Tom
Super User

@greg6363 wrote:

No, no.  You're splitting the variables up.  I want to transpose the entire string from each observation while keeping it intact.  


That does not make any sense.  What string are you talking about?  

Tom
Super User Tom
Super User

@greg6363 wrote:

The code does create a single observation but when I generate the output, I still have four fields missing (ID3, Date2, Code1 and Date3).  Do I need to perform one more transpose for these variables?  Please advise.  Thanks.

 

 


Proc transpose without a VAR statement will only work on numeric variables.  Are the missing variables character?

You might need to replicate the process for the numeric and character variables separately in the first PROC TRANSPOSE. To get just the character variables add a var statement like:

VAR _character_;

Depending on whether ID1 and ID2 are character or numeric you might need to filter their value out at some point from the transposed data.

 

There is also some useful macros that have been posted that make this process easier. 

Check with @art297 .

greg6363
Obsidian | Level 7

Yes, the other fields are character formatted so let me make the adjustment on my end and let you know the outcome.  Thanks.

greg6363
Obsidian | Level 7

Could I go back to the data set and convert all the fields (even the dates) into text fields and use the var _character_ statement?  Let me know.  Thanks.

Tom
Super User Tom
Super User
PROC TRANSPOSE will convert them. Try using VAR _ALL_ ;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 33770 views
  • 1 like
  • 6 in conversation