Re: Create column based on record value and group

nicnad · Posted 05-22-2013 05:14 PM

Hi,

Lets says I have the following table named all_criteria

ClientId	CriteriaId	Criteria_Value
12345	1	Yes
12345	2	No
12345	3	John
6789	1	Yes
6789	1	Yes
aaabbb	1	Yes
aaabbb	1	No
aaabbb	2	Yes

And I want to turn it into this table where a column named C* is created for each criteriaId and the value in this cell is the criteria_value. I would also like to reject identical criteria value for the same criteriaid, but concatenate different criteria value (separated with "-") for the same criteriaid. Something like :

ClientID	C1	C2	C3
12345	Yes	No	John
6789	Yes
aaabbb	Yes-No	Yes

I have this little code to create new columns based on criteriaid, but I am clueless how to group by clientid and concatenate the criteria_value if it is different.

Thank you for your help and time.

Haikuo · Posted 05-22-2013 09:32 PM

Although it is possible for people to do it in one-pass data step, I won't force it, you could trade off the simplicity and potentially, some robustness. Therefore I will do it in two steps:

data have1;

input ClientId $ CriteriaId Criteria_Value$;

cards;

12345 1 Yes

12345 2 No

12345 3 John

6789 1 Yes

aaabbb 1 Yes

aaabbb 1 No

aaabbb 2 Yes

;

/*Step1: Using Hash() to prep the data, this is the key step*/

data _null_;

if _n_=1 then do;

if 0 then set have1;

declare hash h(ordered:'a');

h.definekey('ClientId' , 'CriteriaId');

h.definedata('ClientId' , 'CriteriaId', 'Criteria_Value');

h.definedone();

end;

set have1(rename=(Criteria_Value=_value)) end=last;

if h.find() ne 0 then do; Criteria_Value=_value; h.replace();end;

else if Criteria_Value ne _value then do; Criteria_Value=catx('-',Criteria_Value,_value);h.replace();end;

if last then h.output(dataset:'have2');

run;

/*Step2: classical style transpose approach, nothing fancy*/

proc transpose data=have2 out=want prefix=C;

by ClientId notsorted;

var Criteria_Value;

id CriteriaId;

run;

Haikuo

Tom · Posted 05-22-2013 10:57 PM

Other than the request to concatenate the values when there are duplicate key values it could be handled by a simple PROC TRANSPOSE call.

So add a step to do the concatenation. You probably will want to make sure the data is sorted properly. You could probably eliminate the duplicates in the sorting, or the code below will use FIRST./LAST. variables to make sure that only one copy of each value is concatenated.

data middle ;

set all_criteria ;

by clientid criteriaid criteria_value;

length new $200 ;

retain new;

if first.criteriaid then new = criteria_value;

else if first.criteria_value then new = catx('-',new,criteria_value);

if last.criteriaid;

run;

proc transpose data=middle out=want prefix=c ;

by clientid ;

id criteriaid ;

var new;

run;

nicnad · Posted 05-23-2013 11:02 AM

Thank you both for your replies.

Both solution works well but there are a few things I would like to work out.

First off hai.kuo solution does not seem to remove duplicate criteria_value when concatenating.

Also, the columns aren't sorted. My real data has thirty criteriaid and it goes c13,c12,c1,c2 and so on... I would like the column to be sorted.

Finally, I will be using this table to merge with another one from time to time. The first time I create the table, there may not be all my 30 criteriaid, but I want to make sure that columns c1 to c30 are there in order to make my merge painless.

Thank you for your help and time!

Haikuo · Posted 05-23-2013 12:13 PM

@nicnad,

Quote: "First off hai.kuo solution does not seem to remove duplicate criteria_value when concatenating."

It was fine on my SAS 9.3,. Windows 7 64bit, Hash() remove dups by default, for the raw input as is, the code produces exact output you required, unless you mean something else otherwise? So please post the part of data that the code did not work.

Quote: "Also, the columns aren't sorted. My real data has thirty criteriaid and it goes c13,c12,c1,c2 and so on... I would like the column to be sorted."

Again, Hash() sort your data by default, you raw data does not need to be presorted, and your outcome table will be sorted.

Hash() solution tackles your two concerns dead-on, now you make me confused.

Haikuo

Update: I think I have figured out your first concern, it can be easily fixed, but I could be wrong that if you mean something else.

data have1;

input ClientId $ CriteriaId Criteria_Value$;

cards;

aaabbb 1 No

12345 2 No

12345 3 John

6789 1 Yes

6789 1 No

6789 1 Yes

aaabbb 1 Yes

aaabbb 2 Yes

12345 1 Yes

;

/*Step1: Using Hash() to prep the data, this is the key step*/

data _null_;

if _n_=1 then do;

length Criteria_Value $50;

if 0 then set have1;

declare hash h(ordered:'a');

h.definekey('ClientId' , 'CriteriaId');

h.definedata('ClientId' , 'CriteriaId', 'Criteria_Value');

h.definedone();

end;

set have1(rename=(Criteria_Value=_value)) end=last;

if h.find() ne 0 then do; Criteria_Value=_value; h.replace();end;

else if findw(Criteria_Value, compress(_value)) = 0 then do; Criteria_Value=catx('-',Criteria_Value,_value);h.replace();end;

if last then h.output(dataset:'have2');

run;

/*Step2: classical style transpose approach, nothing fancy*/

proc transpose data=have2 out=want prefix=C;

by ClientId notsorted;

var Criteria_Value;

id CriteriaId;

run;

Please note that I have completely rearranged your raw data, hopefully both of your concerns can be addressed. The only thing that could stop you from using Hash() is your table size bigger than the available RAM.

Create column based on record value and group