Hi I have a data set with one to many relation. I want to reduce it to 1-1 by concatenating the distinct values for each column
Here is an example
I have a data set called Customer and Product Affiliation (PA). One customer can have multiple PA for different times.
Here is the input
Customer | PA1 | PA2 | PA3 |
---|---|---|---|
A | H | M | L |
A | H | L | M |
A | H | M | H |
Desired Output
Customer | PA1 | PA2 | PA3 |
---|---|---|---|
A | H | M & L | L & M & H |
i have multiple PA columns. I want to know if there is a generic sas code I can apply
Thank you
VMInc
It is most of the time not a good idea to organize data the way you want to. The next thing you'll be asking for is having the strings sorted (so that it's not the one time "L & M & H" and another time "H & L & M").
What do you want to achieve? Why do you want the data transformed this way? There might be other options.
What's normally a good thing from a data organization point of view is to have data in a long structure. That's what below code does:
data have;
infile datalines truncover dlm=' ';
input (Customer PA1 PA2 PA3) ($);
datalines;
A H M L
A H L M
A H M H
;
run;
data demo(drop= pa:);
set have;
length var_name $32 var_value $8;
array pa_arr pa:;
do over pa_arr;
var_name=vname(pa_arr);
var_value=pa_arr;
output;
end;
run;
Hi Patrick
End user wants just one record per customer. Sorting the values internally is not needed. What is needed is concatenation of rows .
Even if the final output has repeated values..that is fine.
Example
A H&H&H M&L&M L&M&H
As consultants we must sometimes challenge requirements and show our customers alternatives. But o.k., I don't have the real picture here. Below a code sample which could work. The "hash" bit is for avoiding repeated values as this is not only "ugly" but depending on how many records you're having could also lead to very long strings eventually exceeding the defined target variable length.
data have;
infile datalines truncover dlm=' ';
input (Customer PA1 PA2 PA3) ($);
datalines;
A H M L
A H L M
A H M H
;
run;
proc sql noprint;
select
strip(name), cats('L_',name) into :PA_List separated by ' ' , :PA_new_list separated by ' '
from dictionary.columns
where libname='WORK' and memname='HAVE' and upcase(name) like 'PA%'
;
quit;
data want(drop=_: &PA_List);
set have;
by customer;
if _n_=1 then
do;
length _pa_unique $20 _vname $32;
dcl hash h(hashexp:5);
_rc=h.defineKey('customer','_pa_unique','_vname');
_rc=h.defineDone();
length &PA_new_list $200;
retain &PA_new_list;
array pa_arr {*} &PA_List;
array pa_new_arr {*} &PA_new_list;
end;
do _i=1 to dim(pa_arr);
_pa_unique=pa_arr[_i];
_vname=vname(pa_arr[_i]);
if h.check() ne 0 then
do;
pa_new_arr[_i]=catx(' & ',pa_arr[_i],pa_new_arr[_i]);
_rc=h.add();
end;
end;
if last.customer then output;
run;
As consultants we must sometimes challenge requirements and show our customers alternatives.
:smileylove: X 1000
Although I agree with that this kind of wide data organisation is not very practical, here is a way to get it for any number of PA, with sorted value lists:
data have;
input (Customer PA1 PA2 PA3) ($:);
datalines;
A H M L
A H L M
A H M H
;
data have0 / view=have0;
set have;
obs = _n_;
run;
proc transpose data=have0 out=have1(drop=obs rename=col1=PA);
by obs customer;
var pa:;
run;
proc sort data=have1 nodupkey; by customer _NAME_ PA; run;
data want0(rename=_NAME_=ID);
length list $100;
do until(last.customer);
do until(last._NAME_);
set have1; by customer _NAME_;
list = catx(" & ", list, PA);
end;
output;
call missing(list);
end;
run;
proc transpose data=want0 out=want(drop=_NAME_);
by customer;
id ID;
var list;
run;
PG
The most efficient way is ARRAY. But I am too lazy, so use SQL instead of .
data have;
input (Customer PA1 PA2 PA3) ($:);
datalines;
A H M L
A H L M
A H M H
;
run;
proc sql;
create table temp as
select distinct Customer,'PA1' length=10 as id,PA1 as value
from have
union all
select distinct Customer,'PA2' length=10 as id,PA2 as value
from have
union all
select distinct Customer,'PA3' length=10 as id,PA3 as value
from have;
quit;
data temp1;
set temp;
by Customer id;
length v $ 200;
retain v;
v=catx(' & ',v,value);
if last.id then do;output;call missing(v);end;
drop value;
run;
proc transpose data=temp1 out=want(drop=_:);
by Customer;
id id;
var v;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.