Re: alternatives to recursive sql/ deducing the rows

vminc · Posted 07-17-2015 05:58 PM

Hi I have a data set with one to many relation. I want to reduce it to 1-1 by concatenating the distinct values for each column

Here is an example

I have a data set called Customer and Product Affiliation (PA). One customer can have multiple PA for different times.

Here is the input

Customer

PA1

PA2

PA3

A

H

M

L

A

H

L

M

A

H

M

H

Desired Output

Customer

PA1

PA2

PA3

A

H

M & L

L & M & H

i have multiple PA columns. I want to know if there is a generic sas code I can apply

Thank you

VMInc

Patrick · Posted 07-17-2015 09:31 PM

It is most of the time not a good idea to organize data the way you want to. The next thing you'll be asking for is having the strings sorted (so that it's not the one time "L & M & H" and another time "H & L & M").

What do you want to achieve? Why do you want the data transformed this way? There might be other options.

What's normally a good thing from a data organization point of view is to have data in a long structure. That's what below code does:

data have;

infile datalines truncover dlm=' ';

input (Customer PA1 PA2 PA3) ($);

datalines;

A H M L

A H L M

A H M H

;

run;

data demo(drop= pa:);

set have;

length var_name $32 var_value $8;

array pa_arr pa:;

do over pa_arr;

var_name=vname(pa_arr);

var_value=pa_arr;

output;

end;

run;

vminc · Posted 07-17-2015 10:14 PM

Hi Patrick

End user wants just one record per customer. Sorting the values internally is not needed. What is needed is concatenation of rows .

Even if the final output has repeated values..that is fine.

Example

A H&H&H M&L&M L&M&H

Patrick · Posted 07-17-2015 10:55 PM

As consultants we must sometimes challenge requirements and show our customers alternatives. But o.k., I don't have the real picture here. Below a code sample which could work. The "hash" bit is for avoiding repeated values as this is not only "ugly" but depending on how many records you're having could also lead to very long strings eventually exceeding the defined target variable length.

data have;

infile datalines truncover dlm=' ';

input (Customer PA1 PA2 PA3) ($);

datalines;

A H M L

A H L M

A H M H

;

run;

proc sql noprint;

select

strip(name), cats('L_',name) into :PA_List separated by ' ' , :PA_new_list separated by ' '

from dictionary.columns

where libname='WORK' and memname='HAVE' and upcase(name) like 'PA%'

;

quit;

data want(drop=_: &PA_List);

set have;

by customer;

if _n_=1 then

do;

length _pa_unique $20 _vname $32;

dcl hash h(hashexp:5);

_rc=h.defineKey('customer','_pa_unique','_vname');

_rc=h.defineDone();

length &PA_new_list $200;

retain &PA_new_list;

array pa_arr {*} &PA_List;

array pa_new_arr {*} &PA_new_list;

end;

do _i=1 to dim(pa_arr);

_pa_unique=pa_arr[_i];

_vname=vname(pa_arr[_i]);

if h.check() ne 0 then

do;

pa_new_arr[_i]=catx(' & ',pa_arr[_i],pa_new_arr[_i]);

_rc=h.add();

end;

if last.customer then output;

run;

Reeza · Posted 07-18-2015 12:19 AM

As consultants we must sometimes challenge requirements and show our customers alternatives.

:smileylove: X 1000

PGStats · Posted 07-17-2015 11:03 PM

Although I agree with that this kind of wide data organisation is not very practical, here is a way to get it for any number of PA, with sorted value lists:

data have;

input (Customer PA1 PA2 PA3) ($:);

datalines;

A H M L

A H L M

A H M H

;

data have0 / view=have0;

set have;

obs = _n_;

run;

proc transpose data=have0 out=have1(drop=obs rename=col1=PA);

by obs customer;

var pa:;

run;

proc sort data=have1 nodupkey; by customer _NAME_ PA; run;

data want0(rename=_NAME_=ID);

length list $100;

do until(last.customer);

do until(last._NAME_);

set have1; by customer _NAME_;

list = catx(" & ", list, PA);

end;

output;

call missing(list);

end;

run;

proc transpose data=want0 out=want(drop=_NAME_);

by customer;

id ID;

var list;

run;

PG

Ksharp · Posted 07-18-2015 02:45 AM

The most efficient way is ARRAY. But I am too lazy, so use SQL instead of .

data have;

input (Customer PA1 PA2 PA3) ($:);

datalines;

A H M L

A H L M

A H M H

;

run;

proc sql;

create table temp as

select distinct Customer,'PA1' length=10 as id,PA1 as value

from have

union all

select distinct Customer,'PA2' length=10 as id,PA2 as value

from have

union all

select distinct Customer,'PA3' length=10 as id,PA3 as value

from have;

quit;

data temp1;

set temp;

by Customer id;

length v $ 200;

retain v;

v=catx(' & ',v,value);

if last.id then do;output;call missing(v);end;

drop value;

run;

proc transpose data=temp1 out=want(drop=_:);

by Customer;

id id;

var v;

run;

alternatives to recursive sql/ deducing the rows