Help using Base SAS procedures

Create column based on record value and group

Reply
Regular Contributor
Posts: 186

Create column based on record value and group

Hi,

Lets says I have the following table named all_criteria

ClientIdCriteriaIdCriteria_Value
123451Yes
123452No
123453John
67891Yes
67891Yes
aaabbb1Yes
aaabbb1No
aaabbb2Yes

And I want to turn it into this table where a column named C* is created for each criteriaId and the value in this cell is the criteria_value. I would also like to reject identical criteria value for the same criteriaid, but concatenate different criteria value (separated with "-") for the same criteriaid. Something like :

ClientIDC1C2C3
12345YesNoJohn
6789Yes
aaabbbYes-NoYes

I have this little code to create new columns based on criteriaid, but I am clueless how to group by clientid and concatenate the criteria_value if it is different.

Thank you for your help and time.

Respected Advisor
Posts: 3,124

Re: Create column based on record value and group

Although it is possible for people to do it in one-pass data step, I won't force it, you could trade off the simplicity and potentially, some robustness. Therefore I will do it in two steps:

data have1;

  input ClientId $      CriteriaId  Criteria_Value$;

  cards;

12345 1     Yes

12345 2     No

12345 3     John

6789  1     Yes

6789  1     Yes

aaabbb      1     Yes

aaabbb      1     No

aaabbb      2     Yes

;

/*Step1: Using Hash() to prep the data, this is the key step*/

data _null_;

  if _n_=1 then do;

     if 0 then set have1;

       declare hash h(ordered:'a');

       h.definekey('ClientId' , 'CriteriaId');

       h.definedata('ClientId' , 'CriteriaId', 'Criteria_Value');

       h.definedone();

   end;

   set have1(rename=(Criteria_Value=_value)) end=last;

   if h.find() ne 0 then do; Criteria_Value=_value; h.replace();end;

   else if Criteria_Value ne _value then do; Criteria_Value=catx('-',Criteria_Value,_value);h.replace();end;

   if last then h.output(dataset:'have2');

run;

/*Step2: classical style transpose approach, nothing fancy*/

proc transpose data=have2 out=want prefix=C;

  by ClientId notsorted;

  var Criteria_Value;

  id CriteriaId;

run;


Haikuo

Super User
Super User
Posts: 6,502

Re: Create column based on record value and group

Other than the request to concatenate the values when there are duplicate key values it could be handled by a simple PROC TRANSPOSE call.

So add a step to do the concatenation.  You probably will want to make sure the data is sorted properly.  You could probably eliminate the duplicates in the sorting, or the code below will use FIRST./LAST. variables to make sure that only one copy of each value is concatenated.

data middle ;

set all_criteria ;

by clientid criteriaid criteria_value;

length new $200 ;

retain new;

if first.criteriaid then new = criteria_value;

else if first.criteria_value then new = catx('-',new,criteria_value);

if last.criteriaid;

run;

proc transpose data=middle out=want prefix=c ;

  by clientid ;

  id criteriaid ;

  var new;

run;

Regular Contributor
Posts: 186

Re: Create column based on record value and group

Thank you both for your replies.

Both solution works well but there are a few things I would like to work out.

First off hai.kuo solution does not seem to remove duplicate criteria_value when concatenating.

Also, the columns aren't sorted. My real data has thirty criteriaid and it goes c13,c12,c1,c2 and so on... I would like the column to be sorted.

Finally, I will be using this table to merge with another one from time to time. The first time I create the table, there may not be all my 30 criteriaid, but I want to make sure that columns c1 to c30 are there in order to make my merge painless.

Thank you for your help and time!

Respected Advisor
Posts: 3,124

Re: Create column based on record value and group

@nicnad,

Quote: "First off hai.kuo solution does not seem to remove duplicate criteria_value when concatenating."

It was fine on my SAS 9.3,. Windows 7 64bit, Hash() remove dups by default, for the raw input as is, the code produces exact output you required,  unless you mean something else otherwise? So please post the part of data that the code did not work.

Quote: "Also, the columns aren't sorted. My real data has thirty criteriaid and it goes c13,c12,c1,c2 and so on... I would like the column to be sorted."

Again, Hash() sort your data by default, you raw data does not need to be presorted, and your outcome table will be sorted.

Hash() solution tackles your two concerns dead-on, now you make me confused.

Haikuo

Update: I think I have figured out your first concern, it can be easily fixed, but I could be wrong that if you mean something else.

data have1;

  input ClientId $      CriteriaId  Criteria_Value$;

  cards;

aaabbb      1 No

12345 2     No

12345 3     John

6789  1 Yes

6789  1 No

6789  1 Yes

aaabbb      1 Yes

aaabbb      2 Yes

12345 1     Yes

;


/*Step1: Using Hash() to prep the data, this is the key step*/

data _null_;

  if _n_=1 then do;

   length Criteria_Value $50;

     if 0 then set have1;

       declare hash h(ordered:'a');

       h.definekey('ClientId' , 'CriteriaId');

       h.definedata('ClientId' , 'CriteriaId', 'Criteria_Value');

       h.definedone();

   end;

   set have1(rename=(Criteria_Value=_value)) end=last;

   if h.find() ne 0 then do; Criteria_Value=_value; h.replace();end;

   else if findw(Criteria_Value, compress(_value)) = 0 then do; Criteria_Value=catx('-',Criteria_Value,_value);h.replace();end;

   if last then h.output(dataset:'have2');

run;

/*Step2: classical style transpose approach, nothing fancy*/

proc transpose data=have2 out=want prefix=C;

  by ClientId notsorted;

  var Criteria_Value;

  id CriteriaId;

run;

Please note that I have completely rearranged your raw data, hopefully both of your concerns can be addressed. The only thing that could stop you from using Hash() is your table size bigger than the available RAM. 

Ask a Question
Discussion stats
  • 4 replies
  • 187 views
  • 1 like
  • 3 in conversation