Solved: Re: PROC TRANSPOSE

acordes · Posted 06-27-2018 08:37 AM

I thought I would control PROC TRANSPOSE well, but I'm having a problem with it.

Having NLEVELS=3 for the variable concept I would like to have create 3 columns LC, T and PET_COMM_ with the variable value.

But when executing the code I end up getting the error message

ERROR: The ID value "LC" occurs twice in el mismo grupo BY.
ERROR: The ID value "LC" occurs twice in el mismo grupo BY.
ERROR: The ID value "LC" occurs twice in el mismo grupo BY.
ERROR: The ID value "LC" occurs twice in el mismo grupo BY.
ERROR: The ID value "PET_COMM_" occurs twice in el mismo grupo BY.
ERROR: The ID value "PET_COMM_" occurs twice in el mismo grupo BY.
ERROR: The ID value "PET_COMM_" occurs twice in el mismo grupo BY.
ERROR: The ID value "PET_COMM_" occurs twice in el mismo grupo BY.

data WORK.TEST;
  infile datalines dsd truncover;
  input MODELO:$21. NAME_STAT:$33. _NAME_:$14. VALUE:32. CONCEPT:$30.;
datalines4;
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC12,-0.005,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC24,-0.01,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC36,-0.015,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC48,-0.02,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC60,-0.02,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T12,-0.005,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T24,-0.01,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T36,-0.015,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T48,-0.02,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T60,-0.02,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_12,0.5642043742,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_24,0.5041646406,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_36,0.4441249367,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_48,0.3840852925,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_60,0.3290455588,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,LC12,-0.005,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC24,-0.01,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC36,-0.015,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC48,-0.02,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC60,-0.02,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,T12,-0.005,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T24,-0.01,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T36,-0.015,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T48,-0.02,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T60,-0.02,T
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_12,0.5853021073,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_24,0.5252623736,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_36,0.4652226698,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_48,0.4051830255,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_60,0.3501432919,PET_COMM_
;;;;

PROC SORT DATA=WORK.TEST;
BY MODELO NAME_STAT CONCEPT;
RUN;

PROC TRANSPOSE DATA=WORK.TEST(DROP=_NAME_);
BY MODELO NAME_STAT;
VAR VALUE;
ID CONCEPT;
RUN;

RW9 · Posted 06-27-2018 09:42 AM

Not its not a rule, just good practice, like indenting lines within blocks, finishing macro variables with a point, ending blocks with run; etc. If I wrote my posts back in uppercase people would think I was rudely shouting at them. For your requirement, what I would suggest is an array and by group - you could do it in three transposes, and then merge together, but one datastep will suffice:

data WORK.TEST;
  infile datalines dsd truncover;
  input modelo:$21. name_stat:$33. _name_:$14. value:32. concept:$30.;
datalines;
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC12,-0.005,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC24,-0.01,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC36,-0.015,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC48,-0.02,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC60,-0.02,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T12,-0.005,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T24,-0.01,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T36,-0.015,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T48,-0.02,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T60,-0.02,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_12,0.5642043742,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_24,0.5041646406,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_36,0.4441249367,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_48,0.3840852925,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_60,0.3290455588,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,LC12,-0.005,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC24,-0.01,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC36,-0.015,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC48,-0.02,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC60,-0.02,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,T12,-0.005,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T24,-0.01,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T36,-0.015,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T48,-0.02,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T60,-0.02,T
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_12,0.5853021073,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_24,0.5252623736,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_36,0.4652226698,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_48,0.4051830255,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_60,0.3501432919,PET_COMM_
;
run;

data test;
  set test;
  ord=input(compress(_name_,,"kd"),best.);
run;

proc sort data=test;
  by modelo name_stat ord concept;
run;

data want (keep=modelo name_stat ord lc t pet_comm_);
  set test;
  retain lc t pet_comm_;
  by modelo name_stat ord;
  if first.ord then call missing(lc,t,pet_comm_);
  if concept="LC" then lc=value;
  if concept="T" then t=value;
  if concept="PET_COMM_" then pet_comm_=value;
  if last.ord then output; 
run;

Do note that I have created a new variable called ord, this is necessary otherwise there is not way of knowing which records should all be on the same line, I assume that LC12, should appear on the same row as T12 for instance.

View solution in original post

RW9 · Posted 06-27-2018 08:48 AM

Look at the data you posted:

ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC12,-0.005,LC ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC24,-0.01,LC ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC36,-0.015,LC ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC48,-0.02,LC ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC60,-0.02,LC

You are transposing this data by

BY MODELO NAME_STAT;

As we can see modelo + name_stat has five records all with the same LC result. SAS cannot transpose that. You have not provided a want output so its hard to say, but perhaps a datastep before which adds an incremental number to the LC column so each are distinct would solve it:

data test;
  set test;
  by modelo name_stat concept;
  retain c;
  c=ifn(first.concept,1,c+1);
  concept=cats(concept,put(c,best.));
run;

Put that after your sort.

And please stop CODING ALL IN UPPERCASE!

acordes · Posted 06-27-2018 09:05 AM

Thanks @RW9.

Is it an unwritten rule that coding in UPPERCASE is a NO GO?

I want to have this output:

art297 · Posted 06-27-2018 09:34 AM

If your data are already in the order shown in your example, then you could use something like:

data need;
  set test;
  by modelo name_stat concept notsorted;
  if first.concept then order=1;
  else order+1;
run;

PROC SORT DATA=need;
  BY MODELO NAME_STAT order;
RUN;

PROC TRANSPOSE DATA=WORK.need out=want(DROP=_NAME_ order);
  BY MODELO NAME_STAT order;
  VAR VALUE;
  ID CONCEPT;
RUN;

Art, CEO, AnalystFinder.com

SuryaKiran · Posted 06-27-2018 09:56 AM

I like @art297 solution, but instead of assigning order values sequentially use the _NAME_ to get the order values. If there are missing values then assigning sequential numbers might result in different results.

data need;
  set test;
order=substr(_name_,length(_name_)-1,2);
run;

With missing T12 row you get this result with order values from _name_:

If using sequential order then missing value will the at last record.

Thanks,
Suryakiran

acordes · Posted 06-27-2018 10:10 AM

Thanks, very good

RW9 · Posted 06-27-2018 09:42 AM

Not its not a rule, just good practice, like indenting lines within blocks, finishing macro variables with a point, ending blocks with run; etc. If I wrote my posts back in uppercase people would think I was rudely shouting at them. For your requirement, what I would suggest is an array and by group - you could do it in three transposes, and then merge together, but one datastep will suffice:

data WORK.TEST;
  infile datalines dsd truncover;
  input modelo:$21. name_stat:$33. _name_:$14. value:32. concept:$30.;
datalines;
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC12,-0.005,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC24,-0.01,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC36,-0.015,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC48,-0.02,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,LC60,-0.02,LC
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T12,-0.005,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T24,-0.01,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T36,-0.015,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T48,-0.02,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,T60,-0.02,T
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_12,0.5642043742,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_24,0.5041646406,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_36,0.4441249367,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_48,0.3840852925,PET_COMM_
ODUMMYA COMBI FL 2017,ODUMMYA COMBI DIESEL,PET_COMM_60,0.3290455588,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,LC12,-0.005,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC24,-0.01,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC36,-0.015,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC48,-0.02,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,LC60,-0.02,LC
ODUMMYA FL 2017,ODUMMYA DIESEL,T12,-0.005,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T24,-0.01,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T36,-0.015,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T48,-0.02,T
ODUMMYA FL 2017,ODUMMYA DIESEL,T60,-0.02,T
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_12,0.5853021073,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_24,0.5252623736,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_36,0.4652226698,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_48,0.4051830255,PET_COMM_
ODUMMYA FL 2017,ODUMMYA DIESEL,PET_COMM_60,0.3501432919,PET_COMM_
;
run;

data test;
  set test;
  ord=input(compress(_name_,,"kd"),best.);
run;

proc sort data=test;
  by modelo name_stat ord concept;
run;

data want (keep=modelo name_stat ord lc t pet_comm_);
  set test;
  retain lc t pet_comm_;
  by modelo name_stat ord;
  if first.ord then call missing(lc,t,pet_comm_);
  if concept="LC" then lc=value;
  if concept="T" then t=value;
  if concept="PET_COMM_" then pet_comm_=value;
  if last.ord then output; 
run;

Do note that I have created a new variable called ord, this is necessary otherwise there is not way of knowing which records should all be on the same line, I assume that LC12, should appear on the same row as T12 for instance.

acordes · Posted 06-27-2018 09:50 AM

Awesome. AWESOME 🙂

Ok, I understand your code and it's a fine solution.

And regarding the good coding practice I didn't realize that it could be offensive. For my eyes it's cleaner to have all in uppercase.

Bye, Arne

acordes · Posted 06-27-2018 10:12 AM

I post as well the proc sql alternative to your array solution:

data test;
  set test;
  ord=input(compress(_name_,,"kd"),best.);
run;

proc sort data=test;
  by modelo name_stat ord concept;
run;

proc sql;
create table work.test3 as
select modelo, name_stat, ord,
sum (case when concept = "lc" then value else 0 end) as lc format percent9.2,
sum (case when concept = "t" then value else 0 end) as t format percent9.2, 
sum (case when concept = "pet_comm_" then value else 0 end) as pet_comm_ format percent9.2
from test
group by modelo, name_stat, ord;
quit;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away