Hello,
I want to compare negative line by positive line in terms of price. the comparison is made on the variables (name, city, age, product). I want a table which contains just the negatives with their positive correspondents and the name of the variable which is different between the two rows, if I find more than one difference between two rows I do not put it in this table like the case of " Oliver "I have two differences in the city and the age.
data have;
input row name $ city $ age product $ price;
CARDS;
1 Philip Paris 43 P1 -10
2 Mikael Marseille 23 P2 -20
3 Oliver Metz 32 P2 -25
4 Lisa Toulouse 43 P1 -15
5 Philip Paris 55 P1 10
6 Michael Marseille 23 P2 20
7 Oliver Lille 23 P1 25
8 Lisa Toulouse 43 P2 15
;
run;
data want;
input negative_row name $ city $ age product $ price positive_row column_diff $;
CARDS;
1 Philip Paris 43 P1 -10 5 age
2 Mikael Marseille 23 P2 -20 6 name
4 Lisa Toulouse 43 P1 -15 8 product
;
run;
You are right, I'll edit the code.
Try next tested code:
data have;
input row name $ city $ age product $ price;
CARDS;
1 Philip Paris 43 P1 -10
2 Mikael Marseille 23 P2 -20
3 Oliver Metz 32 P2 -25
4 Lisa Toulouse 43 P1 -15
5 Philip Paris 55 P1 10
6 Michael Marseille 23 P2 20
7 Oliver Lille 23 P1 25
8 Lisa Toulouse 43 P2 15
;
run;
proc sql;
create table temp as select
n.*,
p.row as positive_row,
p.city as p_city,
p.name as p_name,
p.product as p_product,
p.age as p_age,
(p.price + n.price) as diff
from have(where=(price < 0)) as n
full join have(where=(price>0)) as p
on n.price = -p.price
order by row;
quit;
data want;
set temp(rename=(row=negative_row_));
length column_diff $6;
/* d# is either true=1 or false=0 */
d1 = (name=p_name);
d2 = (city=p_city);
d3 = (age=P_age);
d4 = (product=p_product);
d5 = (diff=0);
if sum(of d1-d5) = 4 /* 4 out of 5 variables matches excluding row */
then do;
if not d1 then column_diff = "name"; else
if not d2 then column_diff = "city"; else
if not d3 then column_diff = "age"; else
if not d4 then column_diff = "product"; else
if not d5 then column_diff = "price";
output;
end;
keep negative_row_ name city age product price positive_row column_diff;
run;
You are right, I'll edit the code.
last part of the code:
then do;
if not d1 then column_diff = "name"; else
if not d2 then column_diff = "city"; else
if not d3 then column_diff = "age"; else
if not d4 then column_diff = "product"; else
if not d5 then column_diff = "price";
output;
end;
can be shortened and replaced with next code:
array nx {5} $ ("name" "city" "age" "product" "price");
if sum(of d1-d5) = 4 /* 4 out of 5 variables matches excluding row */
then do;
column_diff = nx(whichn(0,of d1-d5));
output;
end;
thus making it more dynamic to adapt to greater number of variables.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.