BookmarkSubscribeRSS Feed
vminc
Calcite | Level 5

Hi I have a data set with one to many relation. I want to reduce it to 1-1 by concatenating the distinct values for each column

Here is an example

I have a data set called Customer and Product Affiliation (PA). One customer can have multiple PA for different times.

Here is the input

CustomerPA1PA2


PA3

AHML
AHLM
AHMH

Desired Output

CustomerPA1PA2


PA3

AHM & LL & M & H

i have multiple PA columns. I want to know if there is a generic sas code I can apply

Thank you

VMInc

6 REPLIES 6
Patrick
Opal | Level 21

It is most of the time not a good idea to organize data the way you want to. The next thing you'll be asking for is having the strings sorted (so that it's not the one time "L & M & H" and another time "H & L & M").

What do you want to achieve? Why do you want the data transformed this way? There might be other options.

What's normally a good thing from a data organization point of view is to have data in a long structure. That's what below code does:

data have;

  infile datalines truncover dlm=' ';

  input (Customer PA1 PA2 PA3) ($);

datalines;

A H M L

A H L M

A H M H

;

run;

data demo(drop= pa:);

  set have;

  length var_name $32 var_value $8;

  array pa_arr pa:;

  do over pa_arr;

    var_name=vname(pa_arr);

    var_value=pa_arr;

    output;

  end;

run;

vminc
Calcite | Level 5

Hi Patrick

End user wants just one record per customer. Sorting the values internally is not needed. What is needed is concatenation of rows .

Even if the final output has repeated values..that is fine.

Example

A   H&H&H  M&L&M   L&M&H

Patrick
Opal | Level 21

As consultants we must sometimes challenge requirements and show our customers alternatives. But o.k., I don't have the real picture here. Below a code sample which could work. The "hash" bit is for avoiding repeated values as this is not only "ugly" but depending on how many records you're having could also lead to very long strings eventually exceeding the defined target variable length.

data have;

  infile datalines truncover dlm=' ';

  input (Customer PA1 PA2 PA3) ($);

  datalines;

A H M L

A H L M

A H M H

;

run;

proc sql noprint;

  select

    strip(name), cats('L_',name) into :PA_List separated by ' '  , :PA_new_list separated by ' ' 

  from dictionary.columns

    where libname='WORK' and memname='HAVE' and upcase(name) like 'PA%'

  ;

quit;

data want(drop=_: &PA_List);

  set have;

  by customer;

  if _n_=1 then

    do;

      length _pa_unique $20 _vname $32;

      dcl hash h(hashexp:5);

      _rc=h.defineKey('customer','_pa_unique','_vname');

      _rc=h.defineDone();

      length &PA_new_list $200;

      retain &PA_new_list;

      array pa_arr {*} &PA_List;

      array pa_new_arr {*} &PA_new_list;

    end;

  do _i=1 to dim(pa_arr);

    _pa_unique=pa_arr[_i];

    _vname=vname(pa_arr[_i]);

    if h.check() ne 0 then

      do;

        pa_new_arr[_i]=catx(' & ',pa_arr[_i],pa_new_arr[_i]);

        _rc=h.add();

      end;

  end;

  if last.customer then output;

run;

Reeza
Super User

As consultants we must sometimes challenge requirements and show our customers alternatives.

:smileylove: X 1000

PGStats
Opal | Level 21

Although I agree with that this kind of wide data organisation is not very practical, here is a way to get it for any number of PA, with sorted value lists:

data have;

input (Customer PA1 PA2 PA3) ($:);

datalines;

A H M L

A H L M

A H M H

;

data have0 / view=have0;

set have;

obs = _n_;

run;

proc transpose data=have0 out=have1(drop=obs rename=col1=PA);

by obs customer;

var pa:;

run;

proc sort data=have1 nodupkey; by customer _NAME_ PA; run;

data want0(rename=_NAME_=ID);

length list $100;

do until(last.customer);

    do until(last._NAME_);

        set have1; by customer _NAME_;

        list = catx(" & ", list, PA);

        end;

    output;

    call missing(list);

    end;

run;

proc transpose data=want0 out=want(drop=_NAME_);

by customer;

id ID;

var list;

run;

PG

PG
Ksharp
Super User

The most efficient way is ARRAY. But I am too lazy, so use SQL instead of .

data have;

input (Customer PA1 PA2 PA3) ($:);

datalines;

A H M L

A H L M

A H M H

;

run;

proc sql;

create table temp as

select distinct Customer,'PA1' length=10 as id,PA1 as value

  from have

union all

select distinct Customer,'PA2' length=10 as id,PA2 as value

  from have

union all

select distinct Customer,'PA3' length=10 as id,PA3 as value

  from have;

quit;

data temp1;

set temp;

by Customer id;

length v $ 200;

retain v;

v=catx(' & ',v,value);

if last.id then do;output;call missing(v);end;

drop value;

run;

proc transpose data=temp1 out=want(drop=_:);

by Customer;

id id;

var v;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1412 views
  • 1 like
  • 5 in conversation