Hi,
I have a requirement on custom transpose. I have duplicate account numbers (Column 1) and it’s corresponding indicators (Column 2). I have to transpose the indicators in the way they are mentioned in Column3. For Example, rows 1, 8, 15, 22 and 23 have plain indicators. So, in column3, those indicators alone should be transposed. Rows 2, 9 and 16 have M indicators. So, those rows should get all the indicators with commas in between.
Can you please let me know whether there is any method to achieve this in SAS?
Row | Column1 | Column2 | Column3 |
1 | 52212227 | 15 | 15,23,24,27,30 |
2 | 52212227 | 15 -M | 15-M,23-M,24-M |
3 | 52212227 | 15 -TS | 15-TS,23-TS,24-TS |
4 | 52212227 | 15 -PR | 15-PR,23-PR,24-PR |
5 | 52212227 | 15 -HQPAR | 15-HQPAR,23-HQPAR,24-HQPAR |
6 | 52212227 | 15 -DU | 15-DU,23-DU,24-DU |
7 | 52212227 | 15 -GU | 15-GU,23-GU,24-GU |
8 | 52212227 | 23 | 15,23,24,27,30 |
9 | 52212227 | 23 -M | 15-M,23-M,24-M |
10 | 52212227 | 23 -TS | 15-TS,23-TS,24-TS |
11 | 52212227 | 23 -PR | 15-PR,23-PR,24-PR |
12 | 52212227 | 23 -HQPAR | 15-HQPAR,23-HQPAR,24-HQPAR |
13 | 52212227 | 23 -DU | 15-DU,23-DU,24-DU |
14 | 52212227 | 23 -GU | 15-GU,23-GU,24-GU |
15 | 52212227 | 24 | 15,23,24,27,30 |
16 | 52212227 | 24 -M | 15-M,23-M,24-M |
17 | 52212227 | 24 -TS | 15-TS,23-TS,24-TS |
18 | 52212227 | 24 -PR | 15-PR,23-PR,24-PR |
19 | 52212227 | 24 -HQPAR | 15-HQPAR,23-HQPAR,24-HQPAR |
20 | 52212227 | 24 -DU | 15-DU,23-DU,24-DU |
21 | 52212227 | 24 -GU | 15-GU,23-GU,24-GU |
22 | 52212227 | 27 | 15,23,24,27,30 |
23 | 52212227 | 30 | 15,23,24,27,30 |
Very similar to Xia keshan, I had it typed earlier at work, so just pasting here after coming home for the sake of it-
data have1;
set have;
length grp $10;
if notdigit(strip(scan(column2,-1,'-')))=0 then grp='1';
else grp=strip(scan(column2,-1,'-'));
run;
proc sort data=have1;
by column1 grp;
run;
data want;
set have1;
by column1 grp;
length column3 $50;
retain column3;
if first.column1 or first.grp then call missing(column3);
column3=catx(',',column3,column2);
if last.grp then output;
drop grp;
run;
Regards,
Naveen Srinivasan
Please clarify what the input and output data sets should look like, for example include a smaller input sample and for that sample show how you would like the output to be created.
Reeza,
Thanks for your reply!!
My input is Column1 and Column2 and expected output is Column1 and Column3.
Input:
Row | Column1 | Column2 |
1 | 52212227 | 15 |
2 | 52212227 | 15 -M |
3 | 52212227 | 15 -TS |
4 | 52212227 | 15 -PR |
5 | 52212227 | 15 -HQPAR |
6 | 52212227 | 15 -DU |
7 | 52212227 | 15 -GU |
8 | 52212227 | 23 |
9 | 52212227 | 23 -M |
10 | 52212227 | 23 -TS |
11 | 52212227 | 23 -PR |
12 | 52212227 | 23 -HQPAR |
13 | 52212227 | 23 -DU |
14 | 52212227 | 23 -GU |
15 | 52212227 | 24 |
16 | 52212227 | 24 -M |
17 | 52212227 | 24 -TS |
18 | 52212227 | 24 -PR |
19 | 52212227 | 24 -HQPAR |
20 | 52212227 | 24 -DU |
21 | 52212227 | 24 -GU |
22 | 52212227 | 27 |
23 | 52212227 | 30 |
Expected Output:
Row | Column1 | Column3 |
1 | 52212227 | 15,23,24,27,30 |
2 | 52212227 | 15-M,23-M,24-M |
3 | 52212227 | 15-TS,23-TS,24-TS |
4 | 52212227 | 15-PR,23-PR,24-PR |
5 | 52212227 | 15-HQPAR,23-HQPAR,24-HQPAR |
6 | 52212227 | 15-DU,23-DU,24-DU |
7 | 52212227 | 15-GU,23-GU,24-GU |
8 | 52212227 | 15,23,24,27,30 |
9 | 52212227 | 15-M,23-M,24-M |
10 | 52212227 | 15-TS,23-TS,24-TS |
11 | 52212227 | 15-PR,23-PR,24-PR |
12 | 52212227 | 15-HQPAR,23-HQPAR,24-HQPAR |
13 | 52212227 | 15-DU,23-DU,24-DU |
14 | 52212227 | 15-GU,23-GU,24-GU |
15 | 52212227 | 15,23,24,27,30 |
16 | 52212227 | 15-M,23-M,24-M |
17 | 52212227 | 15-TS,23-TS,24-TS |
18 | 52212227 | 15-PR,23-PR,24-PR |
19 | 52212227 | 15-HQPAR,23-HQPAR,24-HQPAR |
20 | 52212227 | 15-DU,23-DU,24-DU |
21 | 52212227 | 15-GU,23-GU,24-GU |
22 | 52212227 | 15,23,24,27,30 |
23 | 52212227 | 15,23,24,27,30 |
@gandikk wrote:
Reeza,
Thanks for your reply!!
My input is Column1 and Column2 and expected output is Column1 and Column3.
Input:
Row Column1 Column2 1 52212227 15 2 52212227 15 -M 3 52212227 15 -TS 4 52212227 15 -PR 5 52212227 15 -HQPAR 6 52212227 15 -DU 7 52212227 15 -GU 8 52212227 23 9 52212227 23 -M 10 52212227 23 -TS 11 52212227 23 -PR 12 52212227 23 -HQPAR 13 52212227 23 -DU 14 52212227 23 -GU 15 52212227 24 16 52212227 24 -M 17 52212227 24 -TS 18 52212227 24 -PR 19 52212227 24 -HQPAR 20 52212227 24 -DU 21 52212227 24 -GU 22 52212227 27 23 52212227 30
Expected Output:
Row Column1 Column3 1 52212227 15,23,24,27,30 where do the 23, 24, 27 and 30 come from??? 2 52212227 15-M,23-M,24-M where does 15-M, 23-M and 24-M come from??? 3 52212227 15-TS,23-TS,24-TS 4 52212227 15-PR,23-PR,24-PR 5 52212227 15-HQPAR,23-HQPAR,24-HQPAR 6 52212227 15-DU,23-DU,24-DU 7 52212227 15-GU,23-GU,24-GU 8 52212227 15,23,24,27,30 9 52212227 15-M,23-M,24-M 10 52212227 15-TS,23-TS,24-TS 11 52212227 15-PR,23-PR,24-PR 12 52212227 15-HQPAR,23-HQPAR,24-HQPAR 13 52212227 15-DU,23-DU,24-DU 14 52212227 15-GU,23-GU,24-GU 15 52212227 15,23,24,27,30 16 52212227 15-M,23-M,24-M 17 52212227 15-TS,23-TS,24-TS 18 52212227 15-PR,23-PR,24-PR 19 52212227 15-HQPAR,23-HQPAR,24-HQPAR 20 52212227 15-DU,23-DU,24-DU 21 52212227 15-GU,23-GU,24-GU 22 52212227 15,23,24,27,30 23 52212227 15,23,24,27,30
You can do this by building and using a hash table. Each "item" (row) in the hash table is keyed on the the second "word" in column 2, and it keeps track of a list of correspond full column2 values. The variable word_list keep track of the original order the "word2" values were encountered. Both the hash table and word_list are cleared for re-use at the end of a column1 group:
data want2 (keep=column1 column3);
if _n_=1 then do;
length word2 $10 column3 $50;
declare hash w2();
w2.definekey('word2');
w2.definedata('column1','column3');
w2.definedone();
end;
length word_list $100;
do until (last.column1);
set have;
by column1;
word2=scan(column2,2,'-');
rc=w2.find();
if rc^=0 then do;
word_list=catx(' ',word_list,word2);
column3=column2;
end;
else column3=catx(',',column3,column2);
rc=w2.replace();
end;
do w=1 to countw(word_list);
rc=w2.find(key:scan(word_list,w,' '));
output;
end;
rc=w2.clear();
run;
I assume your data is already sorted by column1.
Hi mkeintz,
Thanks for your reply!!
I tried your code. The code is writing the last value of column2 in the duplicate group to all the values in column1. I used the below code. I couldn't get what is going wrong here. Can you please let me know?
DATA A;
INFILE INPUT1;
INPUT @01 COLUMN1 $CHAR9.
@64 COLUMN2 $CHAR10.;
DATA WANT2 (KEEP=COLUMN1 COLUMN3);
IF _N_=1 THEN DO;
LENGTH WORD2 $10 COLUMN3 $50;
DECLARE HASH W2(DATASET:'A');
W2.DEFINEKEY('COLUMN2');
W2.DEFINEDATA('COLUMN1','COLUMN2');
W2.DEFINEDONE();
END;
LENGTH WORD_LIST $100;
DO UNTIL (LAST.COLUMN1);
SET A;
BY COLUMN1;
WORD2=SCAN(COLUMN2,2,'-');
RC=W2.FIND();
IF RC^=0 THEN DO;
WORD_LIST=CATX(' ',WORD_LIST,WORD2);
COLUMN3=COLUMN2;
END;
ELSE COLUMN3=CATX(',',COLUMN3,COLUMN2);
RC=W2.REPLACE();
END;
DO W=1 TO COUNTW(WORD_LIST);
RC=W2.FIND(KEY:SCAN(WORD_LIST,W,' '));
OUTPUT;
END;
RC=W2.CLEAR();
RUN;
DATA _NULL_;SET WANT2;
FILE OUTPUT;
PUT
@1 COLUMN1 $CHAR9.
@10 COLUMN3 $CHAR50.;
my input:
052212227 | 15 |
052212227 | 15 -M |
052212227 | 15 -TS |
052212227 | 15 -PR |
052212227 | 15 -HQPAR |
052212227 | 15 -DU |
052212227 | 15 -GU |
052212227 | 23 |
052212227 | 23 -M |
052212227 | 23 -TS |
052212227 | 23 -PR |
052212227 | 23 -HQPAR |
052212227 | 23 -DU |
052212227 | 23 -GU |
052212227 | 24 |
052212227 | 24 -M |
052212227 | 24 -TS |
052212227 | 24 -PR |
052212227 | 24 -HQPAR |
052212227 | 24 -DU |
052212227 | 24 -GU |
052212227 | 27 |
052212227 | 30 |
Output I got:
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
052212227 | 30 |
Very similar to Xia keshan, I had it typed earlier at work, so just pasting here after coming home for the sake of it-
data have1;
set have;
length grp $10;
if notdigit(strip(scan(column2,-1,'-')))=0 then grp='1';
else grp=strip(scan(column2,-1,'-'));
run;
proc sort data=have1;
by column1 grp;
run;
data want;
set have1;
by column1 grp;
length column3 $50;
retain column3;
if first.column1 or first.grp then call missing(column3);
column3=catx(',',column3,column2);
if last.grp then output;
drop grp;
run;
Regards,
Naveen Srinivasan
data have;
infile cards expandtabs truncover;
input Row Column1 : $20. Column2 $20.;
cards;
1 52212227 15
2 52212227 15 -M
3 52212227 15 -TS
4 52212227 15 -PR
5 52212227 15 -HQPAR
6 52212227 15 -DU
7 52212227 15 -GU
8 52212227 23
9 52212227 23 -M
10 52212227 23 -TS
11 52212227 23 -PR
12 52212227 23 -HQPAR
13 52212227 23 -DU
14 52212227 23 -GU
15 52212227 24
16 52212227 24 -M
17 52212227 24 -TS
18 52212227 24 -PR
19 52212227 24 -HQPAR
20 52212227 24 -DU
21 52212227 24 -GU
22 52212227 27
23 52212227 30
;
run;
data temp;
set have;
length group $ 80;
temp=scan(column2,-1,'-');
if notdigit(strip(temp))=0 then group='___';
else group=temp;
run;
proc sort data=temp;
by group;
run;
data temp1;
length want $ 200;
do until(last.group);
set temp;
by group;
want=catx(',',want,column2);
end;
keep want group;
run;
data want;
merge temp temp1;
by group;
run;
proc sort data=want;by row;run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.