Solved: Re: proc transpose ID value occurs twice in same BY group

confooseddesi89 · Posted 01-30-2023 04:19 PM

Hello,

I am having an issue running proc transpose on my data. Below are sample data:

ID	Variable	Variable_Num	question_text_short	answer	entry_count
1	Gen_Health	1	General health	Fair	1
1	Gen_Health	1	General health	Fair	2
1	Gen_Health	1	General health	Fair	3
1	GAD	83	Anxiety	Several days	1
1	GAD	83	Anxiety	Some days	2
1	GAD	83	Anxiety	Every day	3
1	PSS	69	Stress	Sometimes	1
1	PSS	69	Stress	Sometimes	2
1	PSS	69	Stress	Never	3

As you can see, the variable_num corresponds to the Variable. Entry_count refers to the time of the entry. I want each Variable to be a separate column. In other words, I want the data structure to be the following:

ID	entry_count	Gen_Health	GAD	PSS
1	1	Fair	Several days	Sometimes
1	2	Fair	Some days	Sometimes
1	3	Fair	Every day	Never

My code is the following:

proc transpose data=data out=wide prefix=answer;
    by entry_count;
    id Variable;
    var answer;
run;

But the log fills with errors such as "ERROR: The ID value "answerGAD" occurs twice in the same BY group." No dataset is produced. I tried both "Variable" and "Variable_Num" for the "id" in the code. Can someone assist? I even tried shortening the prefix to "a" after reading another message on this forum that the prefix was too long, but I still get the same error. Thanks.

ballardw · Posted 01-30-2023 05:18 PM

Change the sort order to ID Entry_count and use the following:

proc transpose data=data out=wide ;
    by id entry_count;
    id Variable;
    idlabel question_text_short;
    var answer;
run;

View solution in original post

ballardw · Posted 01-30-2023 05:18 PM

Change the sort order to ID Entry_count and use the following:

proc transpose data=data out=wide ;
    by id entry_count;
    id Variable;
    idlabel question_text_short;
    var answer;
run;

confooseddesi89 · Posted 01-30-2023 05:29 PM

EDITED: Never mind, I had forgotten a semicolon! ballardw's solution worked!

Hello,

Upon running the code, I get the following error:

ERROR: Variable IDLABEL not found.

It seems as though SAS does not recognize the IDLABEL command.

PaigeMiller · Posted 01-30-2023 05:31 PM

Show us the ENTIRE log for PROC TRANSPOSE. Please copy the log as text and paste it into the window that appears when you click on the </> icon. Always (that's ALWAYS) do this when you have an error in the log.

--
Paige Miller

confooseddesi89 · Posted 01-30-2023 05:40 PM

I can't post the entire thing. Communites.sas.com says it's too long; plus, I had to force clear the window in SAS because it was too long for my window.

Other than what I already pasted before, here's the other information:

ERROR: The ID value "a17" occurs twice in the same BY group.
ERROR: The ID value "a17" occurs twice in the same BY group.
ERROR: The ID value "a17" occurs twice in the same BY group.
ERROR: The ID value "a17" occurs twice in the same BY group.
ERROR: The ID value "a17" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
      entry_count=9
ERROR: The ID value "a10" occurs twice in the same BY group.
ERROR: The ID value "a10" occurs twice in the same BY group.
ERROR: Too many bad BY groups.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 317016 observations read from the data set DATA.
WARNING: The data set WORK.WIDE may be incomplete.  When this step was stopped there were 0
         observations and 0 variables.
WARNING: Data set WORK.WIDE was not replaced because this step was stopped.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           1:07:21.06
      cpu time            1:17.46

Kurt_Bremser · Posted 01-30-2023 05:55 PM

That the IDLABEL statement is mistaken for a variable name points to a missing semicolon for the preceding statement.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

confooseddesi89 · Posted 01-30-2023 06:15 PM

Oh wow I feel like a you-know-what! Haha thanks! This worked great.

pkm_edu · Posted 03-28-2024 09:41 AM

I am trying to create a wide data set with a unique DUPERSID per line and the RXNDC and RXNAME variable values on the same row from a long data set with DUPERSID, RXNDC and RXNAME values in varying number of rows. Below are my SAS code including datalines and the SAS output.

data testdata;
infile datalines truncover;
input DUPERSID :$10.    RXNDC  :$11.   RXNAME :$50.;
datalines;
2320018102     -15     ANTICOAGULANTS
2320018102     -15     ANTICOAGULANTS
2320038101     -15     ANTICOAGULANTS
2320038101     -15     ANTICOAGULANTS
2320038101     -15     ANTICOAGULANTS
2320105101     -15     INSULIN
2320105101     -15     INSULIN
2320105101     -15     INSULIN
2320105101     -15     INSULIN
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
;
proc print data=testdata; run;
proc sort data=WORK.testdata; by DUPERSID; run;

proc transpose data=testdata out=work.wide_IC prefix=RX;
 by DUPERSID ;
 var  rxndc rxname;
 run;

PROC PRINT DATA= WORK.wide_IC ;
RUN;

Output

Obs DUPERSID _NAME_ RX1 RX2 RX3 RX4 RX5 RX6 RX7 RX8

1 2320018102 RXNDC -15 -15
2 2320018102 RXNAME ANTICOAGULANTS ANTICOAGULANTS
3 2320038101 RXNDC -15 -15 -15
4 2320038101 RXNAME ANTICOAGULANTS ANTICOAGULANTS ANTICOAGULANTS
5 2320105101 RXNDC -15 -15 -15 -15
6 2320105101 RXNAME INSULIN INSULIN INSULIN INSULIN
7 2320222101 RXNDC 73070010011 73070010011 73070010011 73070010011 73070010011 73070010011 73070010011 73070010011
8 2320222101 RXNAME INSULIN INSULIN INSULIN INSULIN INSULIN INSULIN INSULIN INSULIN

Issue: The wide data set has the RXNDC and RXNAME values in separate lines rather than the same line.

To further widen the the above output data, I used the following SAS code (with double transposition).
proc sort data=WORK.wide_IC; by DUPERSID _NAME_; run;

proc transpose data=WORK.wide_IC out=work.narrow_IC ;
 by DUPERSID _NAME_ ;
 var  RX:;
run;

proc sort data=WORK.narrow_IC; by DUPERSID; run;
proc transpose data=WORK.narrow_IC out=work.wide2_IC;
 by DUPERSID ;
 var  COL1;
 id _NAME_;
 run;

I am getting the following errors.
ERROR: The ID value "RXNAME" occurs twice in the same BY group.
ERROR: The ID value "RXNAME" occurs twice in the same BY group.

Question: What changes would I make to my code to create a wide data set 
so that it will have the RXNDC and RXNAME values on same line, with 
one DUPERSID per line?

Any help will be appreciated.

ballardw · Posted 03-28-2024 11:13 AM

@pkm_edu

You should start you own thread, possibly referencing this one. Your issue is different than the original poster.

The data step shown results in values of "INSULIN" not "INSULIN ASPA".

Also what will you do with the resulting data set? Typically a wide set is much harder to work with.

Can you show exactly what you expect for output? You have so many repeated values in that example data I'm not sure that I understand what is needed or the utility of that many repeated values on an observation.

pkm_edu · Posted 03-28-2024 12:18 PM

1) Sorry for not starting a new thread. Do you want me to create a new thread?

2) Your comment: The data step shown results in values of "INSULIN" not "INSULIN ASPA".

My response: You're correct! Sorry, I could not determine why the wide data set does show "INSULIN" instead of "INSULIN ASPA" for DUPERSID 2320222101.

3) Your question/comment: Also what will you do with the resulting data set? Typically a wide set is much harder to work with.

My response: I want to count the number of prescription fills/refills at the respondent level and merge this wide data with other person-level data files.

4) Your question/comment: Can you show exactly what you expect for output? You have so many repeated values in that example data I'm not sure that I understand what is needed or the utility of that many repeated values on an observation.

My response: The input and output data (desired) are shown below.

Input data (one DYPERSID repeated twice)
DUPERSID RXNDC RXNAME1
2320018102 -15 ANTICOAGULANTS
2320018102 -15 ANTICOAGULANTS

Desired output data
DUPERSID RXNDC_RX1 RXNAME_RX1 RXNDC_RX2 RXNAME_RX2
2320018102 -15 ANTICOAGULANTS -15 ANTICOAGULANTS

Thanks,

ballardw · Posted 03-28-2024 12:33 PM

@pkm_edu wrote:
1) Sorry for not starting a new thread. Do you want me to create a new thread?

3) Your question/comment: Also what will you do with the resulting data set? Typically a wide set is much harder to work with.

My response: I want to count the number of prescription fills/refills at the respondent level and merge this wide data with other person-level data files.

Yes start a new thread, and your question appears to be a "how to count".

Do you need a data set with the count or a report that you can read. Very likely Proc Freq will handle either.

Consider if you want a count by person of each of your two variables this may be a start:

Proc freq data=work.testdata;
  /* person with separate table counts of rxndc and rxname*/
   tables dupersid*(rxndc rxname) / norow nocol nopercent;
 /* or count of both at one time*/
   tables dupersid * rxndc * rxname/ list nopercent;
run;

You can create output data sets of either type to merge back on to your other data by Dupersid.

pkm_edu · Posted 03-28-2024 07:21 PM

Thank you for your PROC FREQ data aggregation solution (Method 1 SAS code below).
I have also provided two additional solutions using PROC SUMMARY and PROC SQL (Methods 2 and 3 SAS code below).

Below is the SAS Output for aggregated data.

DUPERSID RXNDC RXNAME count
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
2320018102 -15 ANTICOAGULANTS 2
2320038101 -15 ANTICOAGULANTS 3
2320105101 -15 INSULIN 4
2320222101 73070010011 INSULIN ASPA 8

Question: What DATA step code would I write to create the output data set (e.g., example observation below) based on the above data set as input? Any hints (or references) would be appreciated.

DUPERSID RXNDC RXNAME count RXNDC1 RXNAME2 RXNDC1 RXNAME2

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
2320018102 -15 ANTICOAGULANTS 2 -15 ANTICOAGULANTS -15 ANTICOAGULANTS

********************************************************************
data testdata;
infile datalines truncover ;
input DUPERSID :$10. RXNDC :$char11. RXNAME & $50.;
datalines;
2320018102 -15 ANTICOAGULANTS
2320018102 -15 ANTICOAGULANTS
2320038101 -15 ANTICOAGULANTS
2320038101 -15 ANTICOAGULANTS
2320038101 -15 ANTICOAGULANTS
2320105101 -15 INSULIN
2320105101 -15 INSULIN
2320105101 -15 INSULIN
2320105101 -15 INSULIN
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
;
proc print data=testdata; run;

title 'Method 1';
Proc freq data=work.testdata noprint;
tables dupersid * rxndc * rxname / list nopercent out=FreqCount (drop=percent);
run;
proc print data=work.FreqCount;
run;

title 'Method 2';
proc summary data=testdata nway ;
by dupersid rxndc rxname;
output out=want (rename=(_freq_=count)drop=_type_) ;
run;
proc print data=work.want;
run;

title 'Method 3';
proc sql;
select dupersid, rxndc, rxname, count(*) as count
from testdata
group by dupersid, rxndc, rxname;
quit;

confooseddesi89 · Posted 01-30-2023 06:15 PM

Excellent! Worked perfectly.

Classroom Training Available!