Hi everyone,
I'm new to SAS and I need help. I have 2 datasets, ALL FRUIT and BANANAS ONLY. I want to add a column to ALL FRUIT (BANANA PACKAGE) that will match in the BANANAS ONLY dataset. I want the BANANA PACKAGE column be blank if the Sales Type = Only or if Fruit = Banana otherwise "Yes". Basicaly I want to find out when bananas were sold with other fruit. Can someone help me with this?
ALL FRUIT
Customer_No | Fruit | Amount | Sale_Type |
100 | Banana | 5 | Package |
100 | Apple | 2 | Package |
100 | Orange | 3 | Package |
200 | Banana | 5 | Only |
300 | Apple | 2 | Package |
300 | Orange | 3 | Package |
400 | Banana | 5 | Package |
400 | Apple | 2 | Package |
400 | Orange | 3 | Package |
500 | Banana | 5 | Package |
500 | Apple | 2 | Package |
500 | Orange | 3 | Package |
600 | Apple | 2 | Only |
700 | Orange | 3 | Only |
800 | Apple | 2 | Package |
800 | Banana | 5 | Package |
900 | Banana | 5 | Only |
1000 | Banana | 5 | Package |
1000 | Orange | 3 | Package |
BANANAS ONLY
Customer_No | Fruit | Amount | Sale_Type |
100 | Banana | 5 | Package |
200 | Banana | 5 | Only |
400 | Banana | 5 | Package |
500 | Banana | 5 | Package |
800 | Banana | 5 | Package |
900 | Banana | 5 | Only |
1000 | Banana | 5 | Package |
I think it will helpful to show what you expect for output as I can see a least two different results.
Here is what it should be.
Customer_No | Fruit | Amount | Sale_Type | Banana_Package |
100 | Banana | 5 | Package | |
100 | Apple | 2 | Package | Yes |
100 | Orange | 3 | Package | Yes |
200 | Banana | 5 | Only | |
300 | Apple | 2 | Package | |
300 | Orange | 3 | Package | |
400 | Banana | 5 | Package | |
400 | Apple | 2 | Package | Yes |
400 | Orange | 3 | Package | Yes |
500 | Banana | 5 | Package | |
500 | Apple | 2 | Package | Yes |
500 | Orange | 3 | Package | Yes |
600 | Apple | 2 | Only | |
700 | Orange | 3 | Only | |
800 | Apple | 2 | Package | Yes |
800 | Banana | 5 | Package | |
900 | Banana | 5 | Only | |
1000 | Banana | 5 | Package | |
1000 | Orange | 3 | Package | Yes |
If the order of the fruit in the output isn't critical this appears to work.
If you need to keep order, then add a variable to All_fruit to use to sort on and the PROC sql code would use an
ORDER BY clause with that variable after the On comparison within the same ; to close the query.
data ALL_FRUIT;
input Customer_No Fruit $ Amount Sale_Type $;
datalines;
100 Banana 5 Package
100 Apple 2 Package
100 Orange 3 Package
200 Banana 5 Only
300 Apple 2 Package
300 Orange 3 Package
400 Banana 5 Package
400 Apple 2 Package
400 Orange 3 Package
500 Banana 5 Package
500 Apple 2 Package
500 Orange 3 Package
600 Apple 2 Only
700 Orange 3 Only
800 Apple 2 Package
800 Banana 5 Package
900 Banana 5 Only
1000 Banana 5 Package
1000 Orange 3 Package
;
run;
Data BANANAS_ONLY;
input Customer_No Fruit $ Amount Sale_Type $;
datalines;
100 Banana 5 Package
200 Banana 5 Only
400 Banana 5 Package
500 Banana 5 Package
800 Banana 5 Package
900 Banana 5 Only
1000 Banana 5 Package
;
run;
Proc sql;
create table new_all_fruit as
select a.*, case
when a.Sale_Type = 'Only' then ''
when missing(b.Customer_No) then ''
when a.fruit ne b.fruit then 'Yes'
end as Banana_package
from All_fruit as a left join BANANAS_ONLY as b
on a.Customer_No=b.Customer_No;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.