Solved: concatenate through group by using SAS or PROC SQL, urgent help needed

tapas_16880 · Posted 07-19-2012 09:18 AM

Hi All, I am stuck with a SAS file and not sure how to progress.

The layout is -

Cust No	Prod	Acc flag	Bal
111	a1	qaz	10
111	b1	wsx	11
222	c1	wsx	233
333	d1	grw	251
444	e1	dc	25
444	f1	vefw	1231
444	g1	wefr	36
555	h1	bdfgh	615
666	i1	qe	767
777	j1	erg	31
777	k1	eth	472
777	l1	ret	251
777	m1	dfwef	3761
888	p1	ffef	36

I want to introduce two columns in this sas file. One lets say prod_con and other can be Acc_con.

It will look for the cust_no values in this above file. If it finds distinct values then it will simply copy the values of the column Prod and Acc Flag to the new columns prod_con and Acc_con respectively. But whenever it will find duplicates, it should concatenate the values in column Prod and Acc Flag and then copy them into the new columns prod_con and Acc_con respectively.

E.g. for Cust_no = 777, it should put j1k1l1m1 to prod_con and ergethretdfwef to Acc_con and this value will be repeated off course for all 777.

ErikT · Posted 07-20-2012 01:05 AM

How about this:

Create an index on Cust_no on the file

Then create an extra file of unique customers numbers

Read the file with the unique customers numbers and for each one go to the original file using the KEY option in the SET statement to retrieve and concatenate the Prod and Acc_flag variables. The _iorc_=0 and _error_=0 at the end of the step are to prevent error messages in the log.

data original(index=(Cust_no));

input Cust_no Prod $ ACC_flag $ Bal $ ;

datalines;

111 a1 qaz 10

111 b1 wsx 11

222 c1 wsx 233

333 d1 grw 251

444 e1 dc 25

444 f1 vefw 1231

444 g1 wefr 36

555 h1 bdfgh 615

666 i1 qe 767

777 j1 erg 31

777 k1 eth 472

777 l1 ret 251

777 m1 dfwef 3761

888 p1 ffef 36

run;

proc sort data=original(keep=cust_no) out=customers nodupkey;

by Cust_no;

run;

data new;

set customers;

length Prod_con Acc_con $50;

do until (_iorc_ ne 0);

set original key = Cust_no;

if _iorc_ = 0 then do;

Prod_con = CATX('|',Prod_Con,Prod);

Acc_con = CATX('|',Acc_con,Acc_flag);

end;

output;

_iorc_ = 0; _error_=0;

run;

proc print;

run;

View solution in original post

kevin_123 · Posted 07-19-2012 09:48 AM

Hi Tapas. I recommend doing two passes if you do know how many duplicates you can encounter. First to create your string and the second pass to add it back to the original records

data inds; 
input   custno prod $ acc_flag $ bal ; 
 datalines; 111 a1 qaz 10 
 111 b1 wsx 11 
 222 c1 wsx 233 
 333 d1 grw 251 
 444 e1 dc 25 
 444 f1 vefw 1231 
 444 g1 wefr 36 
 555 h1 bdfgh 615 
 666 i1 qe 767 
 777 j1 erg 31 
 777 k1 eth 472 
 777 l1 ret 251 
 777 m1 dfwef 
 3761 
 888 p1 ffef 36 ; 
 run; 
 proc sort data=inds; by custno prod; run; 
 data tmp_ds (keep=custno prod_con acc_con); 
 set inds; 
 by custno; 
 length prod_con acc_con $50.; 
 retain prod_con acc_con; 
 if first.custno then do;
 prod_con =""; Acc_con = ""; 
 end; 
 prod_con = trim(prod_con) || trim(prod);
 acc_con = trim(acc_con) || trim(acc_flag);
 if last.custno then output; 
 run; 
 data final_ds; 
 merge inds  
   tmp_ds;
 by custno; 
 run;

Keith · Posted 07-19-2012 10:02 AM

@kevin_123,

That's pretty much how I would do it. One simplification you could make is to use the CALL CATS function in place of

Varname=TRIM(....) || TRIM(...).

So the 2 lines would be :

call cats(prod_con,prod);

call cats(acc_con,acc_flag);

PaigeMiller · Posted 07-19-2012 09:48 AM

Here's how I would approach this problem

Sort by CustNo
Proc Transpose variable Prod using BY variable CustNo
Proc Transpose variable AccFlag using BY variable CustNo
merge outputs of above 2 steps BY variable CustNo
concatenate as appropriate to produce the two new columns

--
Paige Miller

Haikuo · Posted 07-19-2012 10:39 AM

Hi, From what I can see it is a typical 2XDOW scenario, if the group variable 'custno' is clustered, then no need to sort:

data have;

input (custno prod acc_flag) (:$) bal ;

datalines;

111 a1 qaz 10

111 b1 wsx 11

222 c1 wsx 233

333 d1 grw 251

444 e1 dc 25

444 f1 vefw 1231

444 g1 wefr 36

555 h1 bdfgh 615

666 i1 qe 767

777 j1 erg 31

777 k1 eth 472

777 l1 ret 251

777 m1 dfwef 3761

888 p1 ffef 36

;

data want;

do until (last.custno);

set have;

by custno notsorted;

length prod_con acc_con $50.;

prod_con=cats(prod_con,prod);

acc_con=cats(acc_con,acc_flag);

end;

do until (last.custno);

set have;

by custno notsorted;

output;

end;

run;

proc print;run;

Haikuo

Haikuo · Posted 07-19-2012 11:06 AM

I am also wondering if there is way to use cats() +group by in proc sql in a summary function kind of way, seems neat to me.

For this question, there is another data step way, originally from Mike Zdeb (I think):

data have;

input (custno prod acc_flag) (:$) bal ;

datalines;

111 a1 qaz 10

111 b1 wsx 11

222 c1 wsx 233

333 d1 grw 251

444 e1 dc 25

444 f1 vefw 1231

444 g1 wefr 36

555 h1 bdfgh 615

666 i1 qe 767

777 j1 erg 31

777 k1 eth 472

777 l1 ret 251

777 m1 dfwef 3761

888 p1 ffef 36

;

data want;

set have (in=up) have;

by custno;

length prod_con acc_con $50.;

retain prod_con acc_con;

if first.custno then call missing (prod_con,acc_con);

if up then do;

prod_con=cats(prod_con,prod);

acc_con=cats(acc_con,acc_flag);

end;

else output;

run;

proc print;run;

Haikuo

PGStats · Posted 07-19-2012 01:57 PM

Hi Haikuo Bian, combining your two proposals might be optimal :

data want;

do until (last.custno);

set have (in=up) have;

by custno; /* NOTSORTED cannot be used here */

length prod_con acc_con $50.;

if up then do;

prod_con = cats(prod_con, prod);

acc_con = cats(acc_con, acc_flag);

end;

else output;

end;

run;

PG

notsorted removed by PG.

PG

Haikuo · Posted 07-19-2012 05:09 PM

Yes, PG. That is a good combination, Thanks!

Haikuo

Astounding · Posted 07-19-2012 12:23 PM

Questions, questions, questions ...

For a single customer, could you ever have two observations with the same Product? With the same Account flag?

The answers so far might be assuming uniqueness. That might be correct, but the question should at least be asked.

If duplicates are possible, should the same Product appear twice in the new prod_con variable?

The programming won't change drastically ... it's just a matter of tweaking it properly.

Possibly related: Could a product or an account flag ever contain an embedded blank?

ErikT · Posted 07-20-2012 01:05 AM

How about this:

Create an index on Cust_no on the file

Then create an extra file of unique customers numbers

Read the file with the unique customers numbers and for each one go to the original file using the KEY option in the SET statement to retrieve and concatenate the Prod and Acc_flag variables. The _iorc_=0 and _error_=0 at the end of the step are to prevent error messages in the log.

data original(index=(Cust_no));

input Cust_no Prod $ ACC_flag $ Bal $ ;

datalines;

111 a1 qaz 10

111 b1 wsx 11

222 c1 wsx 233

333 d1 grw 251

444 e1 dc 25

444 f1 vefw 1231

444 g1 wefr 36

555 h1 bdfgh 615

666 i1 qe 767

777 j1 erg 31

777 k1 eth 472

777 l1 ret 251

777 m1 dfwef 3761

888 p1 ffef 36

run;

proc sort data=original(keep=cust_no) out=customers nodupkey;

by Cust_no;

run;

data new;

set customers;

length Prod_con Acc_con $50;

do until (_iorc_ ne 0);

set original key = Cust_no;

if _iorc_ = 0 then do;

Prod_con = CATX('|',Prod_Con,Prod);

Acc_con = CATX('|',Acc_con,Acc_flag);

end;

output;

_iorc_ = 0; _error_=0;

run;

proc print;

run;

flyerrhappy2015 · Posted 05-19-2015 07:07 PM

Is there any way to do this in proc sql?

I need to calcualte 40+ column for sum and mean. SQL is easier to calculate sum and mean, but how can I concatenate row values across each ID: E.g. for Cust_no = 777, it should put j1k1l1m1 to prod_con and get the total balance for each SSN in proc sql

Ksharp · Posted 05-20-2015 09:59 AM

Yes. There is one way. But you need to list all of these unique value as columns , You wouldn't want to that . That is horrible .

concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Re: concatenate through group by using SAS or PROC SQL, urgent help needed

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!