Hello,
I want to compare negative line by positive line in terms of price. the comparison is made on the variables (name, city, age, product). I want a table which contains just the negatives with their positive correspondents and the name of the variable which is different between the two rows, if I find more than one difference between two rows I do not put it in this table like the case of " Oliver "I have two differences in the city and the age.
data have;
input row name $ city $ age product $ price;
CARDS;
1 Philip Paris 43 P1 -10
2 Mikael Marseille 23 P2 -20
3 Oliver Metz 32 P2 -25
4 Lisa Toulouse 43 P1 -15
5 Philip Paris 55 P1 10
6 Michael Marseille 23 P2 20
7 Oliver Lille 23 P1 25
8 Lisa Toulouse 43 P2 15
;
run;
data want;
input negative_row name $ city $ age product $ price positive_row column_diff $;
CARDS;
1 Philip Paris 43 P1 -10 5 age
2 Mikael Marseille 23 P2 -20 6 name
4 Lisa Toulouse 43 P1 -15 8 product
;
run;
You are right, I'll edit the code.
Try next tested code:
data have;
input row name $ city $ age product $ price;
CARDS;
1 Philip Paris 43 P1 -10
2 Mikael Marseille 23 P2 -20
3 Oliver Metz 32 P2 -25
4 Lisa Toulouse 43 P1 -15
5 Philip Paris 55 P1 10
6 Michael Marseille 23 P2 20
7 Oliver Lille 23 P1 25
8 Lisa Toulouse 43 P2 15
;
run;
proc sql;
create table temp as select
n.*,
p.row as positive_row,
p.city as p_city,
p.name as p_name,
p.product as p_product,
p.age as p_age,
(p.price + n.price) as diff
from have(where=(price < 0)) as n
full join have(where=(price>0)) as p
on n.price = -p.price
order by row;
quit;
data want;
set temp(rename=(row=negative_row_));
length column_diff $6;
/* d# is either true=1 or false=0 */
d1 = (name=p_name);
d2 = (city=p_city);
d3 = (age=P_age);
d4 = (product=p_product);
d5 = (diff=0);
if sum(of d1-d5) = 4 /* 4 out of 5 variables matches excluding row */
then do;
if not d1 then column_diff = "name"; else
if not d2 then column_diff = "city"; else
if not d3 then column_diff = "age"; else
if not d4 then column_diff = "product"; else
if not d5 then column_diff = "price";
output;
end;
keep negative_row_ name city age product price positive_row column_diff;
run;
You are right, I'll edit the code.
last part of the code:
then do;
if not d1 then column_diff = "name"; else
if not d2 then column_diff = "city"; else
if not d3 then column_diff = "age"; else
if not d4 then column_diff = "product"; else
if not d5 then column_diff = "price";
output;
end;
can be shortened and replaced with next code:
array nx {5} $ ("name" "city" "age" "product" "price");
if sum(of d1-d5) = 4 /* 4 out of 5 variables matches excluding row */
then do;
column_diff = nx(whichn(0,of d1-d5));
output;
end;
thus making it more dynamic to adapt to greater number of variables.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.