DATA Step, Macro, Functions and more

Duplicate check across columns

Reply
Contributor
Posts: 36

Duplicate check across columns

Hi,

 

I am creating a framework which involves allocating scores to variables in columns. I have 6 variables that hold numeric data and I  

want to create a new flag with "Tie" or No "Tie" depending if any of the same numerics are featured in the list of variables and match the "highest score" column. I want the highest score column only allocated to one variable. Please see the red areas highlighted below. From the data attached you can see that there are variables tied with the numeric "5" which match the highest score column.Where there is a tie I want to use the variables in Green with the highest score in order to reassign the variable at the end of the process. This example should be reallocated from ACT_total_score to DR_total_score,

 

Would you happen to know the best way to do this?

 

Thanks

Chris

Super User
Super User
Posts: 7,955

Re: Duplicate check across columns

POst test data in the form of a datastep in your post.  Office files are dangerous and do not show structure.  Also, please show required output.  A simple datastep should be able to cover most of your requirements.

Contributor
Posts: 36

Re: Duplicate check across columns

Hi,

 

Please see the data step below:

 

data tie;

set test3;

if ACT_total_score >0 and ACT_total_score = largest and CORP_total_score >0 and CORP_total_score = largest then Allocation = "True"; else Allocation = "False";
if ACT_total_score >0 and ACT_total_score = largest and DR_total_score >0 and DR_total_score = largest then Allocation = "True"; else Allocation = "False";
if ACT_total_score >0 and ACT_total_score = largest and EPB_total_score >0 and EPB_total_score = largest then Allocation = "True"; else Allocation = "False";
if ACT_total_score >0 and ACT_total_score = largest and FIN_total_score >0 and FIN_total_score = largest then Allocation = "True"; else Allocation = "False";
if ACT_total_score >0 and ACT_total_score = largest and IP_IT_total_score >0 and IP_IT_total_score = largest then Allocation = "True"; else Allocation = "False";
if ACT_total_score >0 and ACT_total_score = largest and RE_total_score >0 and RE_total_score = largest then Allocation = "True"; else Allocation = "False";
if ACT_total_score >0 and ACT_total_score = largest and TAX_total_score >0 and TAX_total_score = largest then Allocation = "True"; else Allocation = "False";
run;

 

Basically, I just want to flag between 8 variables where the numerics in the data are the same and match it to a value stored in a column that identifies the highest value out of the 8 variables. The data step above feels long winded especially when I have to write all of the possible matched outcomes across the 8 variables.

 

Thanks

Super User
Posts: 19,806

Re: Duplicate check across columns

1. Create an array for your list of variables. 

2. Look at the MAX function to determine the highest value across an array

3. Look at WHICHN () tells you if a value is in the series, ie is 25 in list of array values or is the max in the list?

 

Contributor
Posts: 36

Re: Duplicate check across columns

Hi,

 

Thanks for sending the information through. How does the Which function deal with ties where the same numeric is within the array? I would want the column to be populated with a character to identify duplicates within the array.

Super User
Super User
Posts: 7,955

Re: Duplicate check across columns

Sorry, you misunderstand, I was after some test data in the form of a datastep, not the logic.  You can follow this post:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

Ask a Question
Discussion stats
  • 5 replies
  • 126 views
  • 0 likes
  • 3 in conversation