BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

Hello,

 

I am having an issue running proc transpose on my data. Below are sample data:

 

ID

Variable

Variable_Num

question_text_short

answer

entry_count

1

Gen_Health

1

General health

Fair

1

1

Gen_Health

1

General health

Fair

2

1

Gen_Health

1

General health

Fair

3

1

GAD

83

Anxiety

Several days

1

1

GAD

83

Anxiety

Some days

2

1

GAD

83

Anxiety

Every day

3

1

PSS

69

Stress

Sometimes

1

1

PSS

69

Stress

Sometimes

2

1

PSS

69

Stress

Never

3

 

As you can see, the variable_num corresponds to the Variable. Entry_count refers to the time of the entry. I want each Variable to be a separate column. In other words, I want the data structure to be the following:

 

ID

entry_count

Gen_Health

GAD

PSS

1

1

Fair

Several days

Sometimes

1

2

Fair

Some days

Sometimes

1

3

Fair

Every day

Never

 

My code is the following:

 

proc transpose data=data out=wide prefix=answer;
    by entry_count;
    id Variable;
    var answer;
run;

 

 

But the log fills with errors such as "ERROR: The ID value "answerGAD" occurs twice in the same BY group." No dataset is produced. I tried both "Variable" and "Variable_Num" for the "id" in the code. Can someone assist? I even tried shortening the prefix to "a" after reading another message on this forum that the prefix was too long, but I still get the same error. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Change the sort order to ID Entry_count and use the following:

proc transpose data=data out=wide ;
    by id entry_count;
    id Variable;
    idlabel question_text_short;
    var answer;
run;

View solution in original post

12 REPLIES 12
ballardw
Super User

Change the sort order to ID Entry_count and use the following:

proc transpose data=data out=wide ;
    by id entry_count;
    id Variable;
    idlabel question_text_short;
    var answer;
run;
confooseddesi89
Quartz | Level 8

EDITED: Never mind, I had forgotten a semicolon! ballardw's solution worked!

 

 

Hello,

 

Upon running the code, I get the following error:

 

ERROR: Variable IDLABEL not found.

 

It seems as though SAS does not recognize the IDLABEL command.

 

 

PaigeMiller
Diamond | Level 26

Show us the ENTIRE log for PROC TRANSPOSE. Please copy the log as text and paste it into the window that appears when you click on the </> icon. Always (that's ALWAYS) do this when you have an error in the log.

PaigeMiller_0-1663012019648.png

--
Paige Miller
confooseddesi89
Quartz | Level 8

I can't post the entire thing. Communites.sas.com says it's too long; plus, I had to force clear the window in SAS because it was too long for my window.

 

Other than what I already pasted before, here's the other information:

 

ERROR: The ID value "a17" occurs twice in the same BY group.
ERROR: The ID value "a17" occurs twice in the same BY group.
ERROR: The ID value "a17" occurs twice in the same BY group.
ERROR: The ID value "a17" occurs twice in the same BY group.
ERROR: The ID value "a17" occurs twice in the same BY group.
NOTE: The above message was for the following BY group:
      entry_count=9
ERROR: The ID value "a10" occurs twice in the same BY group.
ERROR: The ID value "a10" occurs twice in the same BY group.
ERROR: Too many bad BY groups.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 317016 observations read from the data set DATA.
WARNING: The data set WORK.WIDE may be incomplete.  When this step was stopped there were 0
         observations and 0 variables.
WARNING: Data set WORK.WIDE was not replaced because this step was stopped.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
      real time           1:07:21.06
      cpu time            1:17.46
confooseddesi89
Quartz | Level 8

Oh wow I feel like a you-know-what! Haha thanks! This worked great.

pkm_edu
Quartz | Level 8

I am trying to create a wide data set with a unique DUPERSID per line and the RXNDC and RXNAME variable values  on the same row from a long data set with  DUPERSID, RXNDC and RXNAME values  in varying number of rows. Below are my  SAS code including datalines and the SAS output.

data testdata;
infile datalines truncover;
input DUPERSID :$10.    RXNDC  :$11.   RXNAME :$50.;
datalines;
2320018102     -15     ANTICOAGULANTS
2320018102     -15     ANTICOAGULANTS
2320038101     -15     ANTICOAGULANTS
2320038101     -15     ANTICOAGULANTS
2320038101     -15     ANTICOAGULANTS
2320105101     -15     INSULIN
2320105101     -15     INSULIN
2320105101     -15     INSULIN
2320105101     -15     INSULIN
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
2320222101    73070010011    INSULIN ASPA
;
proc print data=testdata; run;
proc sort data=WORK.testdata; by DUPERSID; run;

proc transpose data=testdata out=work.wide_IC prefix=RX;
 by DUPERSID ;
 var  rxndc rxname;
 run;

PROC PRINT DATA= WORK.wide_IC ;
RUN;


Output

Obs DUPERSID _NAME_ RX1 RX2 RX3 RX4 RX5 RX6 RX7 RX8

1 2320018102 RXNDC -15 -15
2 2320018102 RXNAME ANTICOAGULANTS ANTICOAGULANTS
3 2320038101 RXNDC -15 -15 -15
4 2320038101 RXNAME ANTICOAGULANTS ANTICOAGULANTS ANTICOAGULANTS
5 2320105101 RXNDC -15 -15 -15 -15
6 2320105101 RXNAME INSULIN INSULIN INSULIN INSULIN
7 2320222101 RXNDC 73070010011 73070010011 73070010011 73070010011 73070010011 73070010011 73070010011 73070010011
8 2320222101 RXNAME INSULIN INSULIN INSULIN INSULIN INSULIN INSULIN INSULIN INSULIN

Issue: The wide data set has the RXNDC and RXNAME values in separate lines rather than the same line.

 

To further widen the  the above output data, I used the following SAS code (with double transposition).

proc sort data=WORK.wide_IC; by DUPERSID _NAME_; run;

proc transpose data=WORK.wide_IC out=work.narrow_IC ;
 by DUPERSID _NAME_ ;
 var  RX:;
run;

proc sort data=WORK.narrow_IC; by DUPERSID; run;
proc transpose data=WORK.narrow_IC out=work.wide2_IC;
 by DUPERSID ;
 var  COL1;
 id _NAME_;
 run;

I am getting the following errors.
ERROR: The ID value "RXNAME" occurs twice in the same BY group.
ERROR: The ID value "RXNAME" occurs twice in the same BY group.

Question: What changes would I make to my code to create a wide data set
so that it will have the RXNDC and RXNAME values on same line, with
one DUPERSID per line?

Any help will be appreciated.

 

ballardw
Super User

@pkm_edu 

You should start you own thread, possibly referencing this one. Your issue is different than the original poster.

The data step shown results in values of "INSULIN" not "INSULIN ASPA".

 

Also what will you do with the resulting data set? Typically a wide set is much harder to work with.

 

Can you show exactly what you expect for output? You have so many repeated values in that example data I'm not sure that I understand what is needed or the utility of that many repeated values on an observation.

pkm_edu
Quartz | Level 8
1) Sorry for not starting a new thread. Do you want me to create a new thread?

2) Your comment: The data step shown results in values of "INSULIN" not "INSULIN ASPA".

My response: You're correct! Sorry, I could not determine why the wide data set does show "INSULIN" instead of "INSULIN ASPA" for DUPERSID 2320222101.

3) Your question/comment: Also what will you do with the resulting data set? Typically a wide set is much harder to work with.

My response: I want to count the number of prescription fills/refills at the respondent level and merge this wide data with other person-level data files.

4) Your question/comment: Can you show exactly what you expect for output? You have so many repeated values in that example data I'm not sure that I understand what is needed or the utility of that many repeated values on an observation.

My response: The input and output data (desired) are shown below.

Input data (one DYPERSID repeated twice)
DUPERSID RXNDC RXNAME1
2320018102 -15 ANTICOAGULANTS
2320018102 -15 ANTICOAGULANTS

Desired output data
DUPERSID RXNDC_RX1 RXNAME_RX1 RXNDC_RX2 RXNAME_RX2
2320018102 -15 ANTICOAGULANTS -15 ANTICOAGULANTS

Thanks,
ballardw
Super User

@pkm_edu wrote:
1) Sorry for not starting a new thread. Do you want me to create a new thread?

3) Your question/comment: Also what will you do with the resulting data set? Typically a wide set is much harder to work with.

My response: I want to count the number of prescription fills/refills at the respondent level and merge this wide data with other person-level data files.


Yes start a new thread, and your question appears to be a "how to count".

Do you need a data set with the count or a report that you can read. Very likely Proc Freq will handle either.

Consider if you want a count by person of each of your two variables this may be a start:

 

Proc freq data=work.testdata;
  /* person with separate table counts of rxndc and rxname*/
   tables dupersid*(rxndc rxname) / norow nocol nopercent;
 /* or count of both at one time*/
   tables dupersid * rxndc * rxname/ list nopercent;
run;

You can create output data sets of either type to merge back on to your other data by Dupersid.

pkm_edu
Quartz | Level 8
Thank you for your PROC FREQ data aggregation solution (Method 1 SAS code below).
I have also provided two additional solutions using PROC SUMMARY and PROC SQL (Methods 2 and 3 SAS code below).

Below is the SAS Output for aggregated data.

DUPERSID RXNDC RXNAME count
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
2320018102 -15 ANTICOAGULANTS 2
2320038101 -15 ANTICOAGULANTS 3
2320105101 -15 INSULIN 4
2320222101 73070010011 INSULIN ASPA 8

Question: What DATA step code would I write to create the output data set (e.g., example observation below) based on the above data set as input? Any hints (or references) would be appreciated.

DUPERSID RXNDC RXNAME count RXNDC1 RXNAME2 RXNDC1 RXNAME2

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
2320018102 -15 ANTICOAGULANTS 2 -15 ANTICOAGULANTS -15 ANTICOAGULANTS



********************************************************************
data testdata;
infile datalines truncover ;
input DUPERSID :$10. RXNDC :$char11. RXNAME & $50.;
datalines;
2320018102 -15 ANTICOAGULANTS
2320018102 -15 ANTICOAGULANTS
2320038101 -15 ANTICOAGULANTS
2320038101 -15 ANTICOAGULANTS
2320038101 -15 ANTICOAGULANTS
2320105101 -15 INSULIN
2320105101 -15 INSULIN
2320105101 -15 INSULIN
2320105101 -15 INSULIN
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
2320222101 73070010011 INSULIN ASPA
;
proc print data=testdata; run;

title 'Method 1';
Proc freq data=work.testdata noprint;
tables dupersid * rxndc * rxname / list nopercent out=FreqCount (drop=percent);
run;
proc print data=work.FreqCount;
run;

title 'Method 2';
proc summary data=testdata nway ;
by dupersid rxndc rxname;
output out=want (rename=(_freq_=count)drop=_type_) ;
run;
proc print data=work.want;
run;

title 'Method 3';
proc sql;
select dupersid, rxndc, rxname, count(*) as count
from testdata
group by dupersid, rxndc, rxname;
quit;

confooseddesi89
Quartz | Level 8

Excellent! Worked perfectly.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 9794 views
  • 4 likes
  • 5 in conversation