Hi everyone,
I'm new to SAS and I need help. I have 2 datasets, ALL FRUIT and BANANAS ONLY. I want to add a column to ALL FRUIT (BANANA PACKAGE) that will match in the BANANAS ONLY dataset. I want the BANANA PACKAGE column be blank if the Sales Type = Only or if Fruit = Banana otherwise "Yes". Basicaly I want to find out when bananas were sold with other fruit. Can someone help me with this?
ALL FRUIT
Customer_No | Fruit | Amount | Sale_Type |
100 | Banana | 5 | Package |
100 | Apple | 2 | Package |
100 | Orange | 3 | Package |
200 | Banana | 5 | Only |
300 | Apple | 2 | Package |
300 | Orange | 3 | Package |
400 | Banana | 5 | Package |
400 | Apple | 2 | Package |
400 | Orange | 3 | Package |
500 | Banana | 5 | Package |
500 | Apple | 2 | Package |
500 | Orange | 3 | Package |
600 | Apple | 2 | Only |
700 | Orange | 3 | Only |
800 | Apple | 2 | Package |
800 | Banana | 5 | Package |
900 | Banana | 5 | Only |
1000 | Banana | 5 | Package |
1000 | Orange | 3 | Package |
BANANAS ONLY
Customer_No | Fruit | Amount | Sale_Type |
100 | Banana | 5 | Package |
200 | Banana | 5 | Only |
400 | Banana | 5 | Package |
500 | Banana | 5 | Package |
800 | Banana | 5 | Package |
900 | Banana | 5 | Only |
1000 | Banana | 5 | Package |
I think it will helpful to show what you expect for output as I can see a least two different results.
Here is what it should be.
Customer_No | Fruit | Amount | Sale_Type | Banana_Package |
100 | Banana | 5 | Package | |
100 | Apple | 2 | Package | Yes |
100 | Orange | 3 | Package | Yes |
200 | Banana | 5 | Only | |
300 | Apple | 2 | Package | |
300 | Orange | 3 | Package | |
400 | Banana | 5 | Package | |
400 | Apple | 2 | Package | Yes |
400 | Orange | 3 | Package | Yes |
500 | Banana | 5 | Package | |
500 | Apple | 2 | Package | Yes |
500 | Orange | 3 | Package | Yes |
600 | Apple | 2 | Only | |
700 | Orange | 3 | Only | |
800 | Apple | 2 | Package | Yes |
800 | Banana | 5 | Package | |
900 | Banana | 5 | Only | |
1000 | Banana | 5 | Package | |
1000 | Orange | 3 | Package | Yes |
If the order of the fruit in the output isn't critical this appears to work.
If you need to keep order, then add a variable to All_fruit to use to sort on and the PROC sql code would use an
ORDER BY clause with that variable after the On comparison within the same ; to close the query.
data ALL_FRUIT;
input Customer_No Fruit $ Amount Sale_Type $;
datalines;
100 Banana 5 Package
100 Apple 2 Package
100 Orange 3 Package
200 Banana 5 Only
300 Apple 2 Package
300 Orange 3 Package
400 Banana 5 Package
400 Apple 2 Package
400 Orange 3 Package
500 Banana 5 Package
500 Apple 2 Package
500 Orange 3 Package
600 Apple 2 Only
700 Orange 3 Only
800 Apple 2 Package
800 Banana 5 Package
900 Banana 5 Only
1000 Banana 5 Package
1000 Orange 3 Package
;
run;
Data BANANAS_ONLY;
input Customer_No Fruit $ Amount Sale_Type $;
datalines;
100 Banana 5 Package
200 Banana 5 Only
400 Banana 5 Package
500 Banana 5 Package
800 Banana 5 Package
900 Banana 5 Only
1000 Banana 5 Package
;
run;
Proc sql;
create table new_all_fruit as
select a.*, case
when a.Sale_Type = 'Only' then ''
when missing(b.Customer_No) then ''
when a.fruit ne b.fruit then 'Yes'
end as Banana_package
from All_fruit as a left join BANANAS_ONLY as b
on a.Customer_No=b.Customer_No;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.