Solved: Re: Custom Transpose

gandikk · Posted 04-26-2017 01:46 PM

Hi,

I have a requirement on custom transpose. I have duplicate account numbers (Column 1) and it’s corresponding indicators (Column 2). I have to transpose the indicators in the way they are mentioned in Column3. For Example, rows 1, 8, 15, 22 and 23 have plain indicators. So, in column3, those indicators alone should be transposed. Rows 2, 9 and 16 have M indicators. So, those rows should get all the indicators with commas in between.

Can you please let me know whether there is any method to achieve this in SAS?

Row	Column1	Column2	Column3
1	52212227	15	15,23,24,27,30
2	52212227	15 -M	15-M,23-M,24-M
3	52212227	15 -TS	15-TS,23-TS,24-TS
4	52212227	15 -PR	15-PR,23-PR,24-PR
5	52212227	15 -HQPAR	15-HQPAR,23-HQPAR,24-HQPAR
6	52212227	15 -DU	15-DU,23-DU,24-DU
7	52212227	15 -GU	15-GU,23-GU,24-GU
8	52212227	23	15,23,24,27,30
9	52212227	23 -M	15-M,23-M,24-M
10	52212227	23 -TS	15-TS,23-TS,24-TS
11	52212227	23 -PR	15-PR,23-PR,24-PR
12	52212227	23 -HQPAR	15-HQPAR,23-HQPAR,24-HQPAR
13	52212227	23 -DU	15-DU,23-DU,24-DU
14	52212227	23 -GU	15-GU,23-GU,24-GU
15	52212227	24	15,23,24,27,30
16	52212227	24 -M	15-M,23-M,24-M
17	52212227	24 -TS	15-TS,23-TS,24-TS
18	52212227	24 -PR	15-PR,23-PR,24-PR
19	52212227	24 -HQPAR	15-HQPAR,23-HQPAR,24-HQPAR
20	52212227	24 -DU	15-DU,23-DU,24-DU
21	52212227	24 -GU	15-GU,23-GU,24-GU
22	52212227	27	15,23,24,27,30
23	52212227	30	15,23,24,27,30

SAS@EMMAUS

novinosrin · Posted 04-27-2017 11:12 AM

Very similar to Xia keshan, I had it typed earlier at work, so just pasting here after coming home for the sake of it-

data have1;

set have;

length grp $10;

if notdigit(strip(scan(column2,-1,'-')))=0 then grp='1';

else grp=strip(scan(column2,-1,'-'));

run;

proc sort data=have1;

by column1 grp;

run;

data want;

set have1;

by column1 grp;

length column3 $50;

retain column3;

if first.column1 or first.grp then call missing(column3);

column3=catx(',',column3,column2);

if last.grp then output;

drop grp;

run;

Regards,

Naveen Srinivasan

View solution in original post

Reeza · Posted 04-26-2017 01:49 PM

Please clarify what the input and output data sets should look like, for example include a smaller input sample and for that sample show how you would like the output to be created.

gandikk · Posted 04-26-2017 02:23 PM

Reeza,

Thanks for your reply!!

My input is Column1 and Column2 and expected output is Column1 and Column3.

Input:

Row	Column1	Column2
1	52212227	15
2	52212227	15 -M
3	52212227	15 -TS
4	52212227	15 -PR
5	52212227	15 -HQPAR
6	52212227	15 -DU
7	52212227	15 -GU
8	52212227	23
9	52212227	23 -M
10	52212227	23 -TS
11	52212227	23 -PR
12	52212227	23 -HQPAR
13	52212227	23 -DU
14	52212227	23 -GU
15	52212227	24
16	52212227	24 -M
17	52212227	24 -TS
18	52212227	24 -PR
19	52212227	24 -HQPAR
20	52212227	24 -DU
21	52212227	24 -GU
22	52212227	27
23	52212227	30

Expected Output:

Row	Column1	Column3
1	52212227	15,23,24,27,30
2	52212227	15-M,23-M,24-M
3	52212227	15-TS,23-TS,24-TS
4	52212227	15-PR,23-PR,24-PR
5	52212227	15-HQPAR,23-HQPAR,24-HQPAR
6	52212227	15-DU,23-DU,24-DU
7	52212227	15-GU,23-GU,24-GU
8	52212227	15,23,24,27,30
9	52212227	15-M,23-M,24-M
10	52212227	15-TS,23-TS,24-TS
11	52212227	15-PR,23-PR,24-PR
12	52212227	15-HQPAR,23-HQPAR,24-HQPAR
13	52212227	15-DU,23-DU,24-DU
14	52212227	15-GU,23-GU,24-GU
15	52212227	15,23,24,27,30
16	52212227	15-M,23-M,24-M
17	52212227	15-TS,23-TS,24-TS
18	52212227	15-PR,23-PR,24-PR
19	52212227	15-HQPAR,23-HQPAR,24-HQPAR
20	52212227	15-DU,23-DU,24-DU
21	52212227	15-GU,23-GU,24-GU
22	52212227	15,23,24,27,30
23	52212227	15,23,24,27,30

SAS@EMMAUS

ballardw · Posted 04-26-2017 03:35 PM

@gandikk wrote:

Reeza,

Thanks for your reply!!

My input is Column1 and Column2 and expected output is Column1 and Column3.

Input:

Row Column1 Column2

1 52212227 15

2 52212227      15 -M

3 52212227      15 -TS

4 52212227      15 -PR

5 52212227      15 -HQPAR

6 52212227      15 -DU

7 52212227      15 -GU

8 52212227 23

9 52212227      23 -M

10 52212227      23 -TS

11 52212227      23 -PR

12 52212227      23 -HQPAR

13 52212227      23 -DU

14 52212227      23 -GU

15 52212227 24

16 52212227      24 -M

17 52212227      24 -TS

18 52212227      24 -PR

19 52212227      24 -HQPAR

20 52212227      24 -DU

21 52212227      24 -GU

22 52212227 27

23 52212227 30

Expected Output:

Row Column1 Column3

1 52212227 15,23,24,27,30 where do the 23, 24, 27 and 30 come from???

2 52212227 15-M,23-M,24-M where does 15-M, 23-M and 24-M come from???

3 52212227 15-TS,23-TS,24-TS

4 52212227 15-PR,23-PR,24-PR

5 52212227 15-HQPAR,23-HQPAR,24-HQPAR

6 52212227 15-DU,23-DU,24-DU

7 52212227 15-GU,23-GU,24-GU

8 52212227 15,23,24,27,30

9 52212227 15-M,23-M,24-M

10 52212227 15-TS,23-TS,24-TS

11 52212227 15-PR,23-PR,24-PR

12 52212227 15-HQPAR,23-HQPAR,24-HQPAR

13 52212227 15-DU,23-DU,24-DU

14 52212227 15-GU,23-GU,24-GU

15 52212227 15,23,24,27,30

16 52212227 15-M,23-M,24-M

17 52212227 15-TS,23-TS,24-TS

18 52212227 15-PR,23-PR,24-PR

19 52212227 15-HQPAR,23-HQPAR,24-HQPAR

20 52212227 15-DU,23-DU,24-DU

21 52212227 15-GU,23-GU,24-GU

22 52212227 15,23,24,27,30

23 52212227 15,23,24,27,30

mkeintz · Posted 04-26-2017 03:37 PM

You can do this by building and using a hash table. Each "item" (row) in the hash table is keyed on the the second "word" in column 2, and it keeps track of a list of correspond full column2 values. The variable word_list keep track of the original order the "word2" values were encountered. Both the hash table and word_list are cleared for re-use at the end of a column1 group:


data want2 (keep=column1 column3);
  if _n_=1 then do;
    length word2 $10  column3 $50;
    declare hash w2();
	  w2.definekey('word2');
	  w2.definedata('column1','column3');
	  w2.definedone();
  end;

  length word_list $100;
  do until (last.column1);
    set have;
	by column1;
	word2=scan(column2,2,'-');
    rc=w2.find();
	if rc^=0 then do;
	  word_list=catx(' ',word_list,word2);
	  column3=column2;
	end;
	else column3=catx(',',column3,column2);
	rc=w2.replace();
  end;
  do w=1 to countw(word_list);
	rc=w2.find(key:scan(word_list,w,' '));
    output;
  end;
  rc=w2.clear();
run;

I assume your data is already sorted by column1.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

gandikk · Posted 04-27-2017 09:49 AM

Hi mkeintz,

Thanks for your reply!!

I tried your code. The code is writing the last value of column2 in the duplicate group to all the values in column1. I used the below code. I couldn't get what is going wrong here. Can you please let me know?

DATA A;
INFILE INPUT1;
INPUT @01 COLUMN1 $CHAR9.
@64 COLUMN2 $CHAR10.;

DATA WANT2 (KEEP=COLUMN1 COLUMN3);
IF _N_=1 THEN DO;
LENGTH WORD2 $10 COLUMN3 $50;
DECLARE HASH W2(DATASET:'A');
W2.DEFINEKEY('COLUMN2');
W2.DEFINEDATA('COLUMN1','COLUMN2');
W2.DEFINEDONE();
END;
LENGTH WORD_LIST $100;
DO UNTIL (LAST.COLUMN1);
SET A;
BY COLUMN1;
WORD2=SCAN(COLUMN2,2,'-');
RC=W2.FIND();

IF RC^=0 THEN DO;
WORD_LIST=CATX(' ',WORD_LIST,WORD2);
COLUMN3=COLUMN2;
END;
ELSE COLUMN3=CATX(',',COLUMN3,COLUMN2);
RC=W2.REPLACE();
END;
DO W=1 TO COUNTW(WORD_LIST);
RC=W2.FIND(KEY:SCAN(WORD_LIST,W,' '));
OUTPUT;
END;
RC=W2.CLEAR();
RUN;
DATA _NULL_;SET WANT2;
FILE OUTPUT;
PUT
@1 COLUMN1 $CHAR9.
@10 COLUMN3 $CHAR50.;

my input:

052212227	15
052212227	15 -M
052212227	15 -TS
052212227	15 -PR
052212227	15 -HQPAR
052212227	15 -DU
052212227	15 -GU
052212227	23
052212227	23 -M
052212227	23 -TS
052212227	23 -PR
052212227	23 -HQPAR
052212227	23 -DU
052212227	23 -GU
052212227	24
052212227	24 -M
052212227	24 -TS
052212227	24 -PR
052212227	24 -HQPAR
052212227	24 -DU
052212227	24 -GU
052212227	27
052212227	30

Output I got:

052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30
052212227	30

SAS@EMMAUS

novinosrin · Posted 04-27-2017 11:12 AM

Very similar to Xia keshan, I had it typed earlier at work, so just pasting here after coming home for the sake of it-

data have1;

set have;

length grp $10;

if notdigit(strip(scan(column2,-1,'-')))=0 then grp='1';

else grp=strip(scan(column2,-1,'-'));

run;

proc sort data=have1;

by column1 grp;

run;

data want;

set have1;

by column1 grp;

length column3 $50;

retain column3;

if first.column1 or first.grp then call missing(column3);

column3=catx(',',column3,column2);

if last.grp then output;

drop grp;

run;

Regards,

Naveen Srinivasan

gandikk · Posted 05-02-2017 04:03 PM

Hi Srinivasan,

Your solution worked for me!! Many thanks for taking time to post the solution!!

SAS@EMMAUS

Ksharp · Posted 04-27-2017 09:59 AM

data have;
infile cards expandtabs truncover;
input Row	Column1	: $20. Column2 $20.;
cards;
1	52212227	15
2	52212227	     15 -M
3	52212227	     15 -TS
4	52212227	     15 -PR
5	52212227	     15 -HQPAR
6	52212227	     15 -DU
7	52212227	     15 -GU
8	52212227	23
9	52212227	     23 -M
10	52212227	     23 -TS
11	52212227	     23 -PR
12	52212227	     23 -HQPAR
13	52212227	     23 -DU
14	52212227	     23 -GU
15	52212227	24
16	52212227	     24 -M
17	52212227	     24 -TS
18	52212227	     24 -PR
19	52212227	     24 -HQPAR
20	52212227	     24 -DU
21	52212227	     24 -GU
22	52212227	27
23	52212227	30
;
run;
data temp;
 set have;
 length group $ 80;
 temp=scan(column2,-1,'-');
 if notdigit(strip(temp))=0 then group='___';
  else group=temp;
run;
proc sort data=temp;
 by group;
run;
data temp1;
length want $ 200;
do until(last.group);
 set temp;
 by group;
 want=catx(',',want,column2);
end;
keep want group;
run;
data want;
 merge temp temp1;
 by group;
run;
proc sort data=want;by row;run;

Registration is open

SAS Training: Just a Click Away