Hello! I have three data sets. I would like to produce a table showing total sales by gender. I was able to get the correct results, but I'm concerned that If I had a larger data set, I can't use if statements. Here's my data so far:
data EMPLOY;
length ID $ 2.;
input ID Gender $ DOB MMDDYY10.;
datalines;
01 F 10/21/46
02 F 09/02/44
03 M 04/23/55
04 F 11/11/38
;
run;
proc sort data=EMPLOY;
by ID;
run;
proc print data=EMPLOY;
format DOB MMDDYY10.;
run;
data PARTS;
input partno price;
datalines;
123 15
234 25
327 20
355 28
789 55
;
proc print data=PARTS;
run;
data SALES;
length ID $ 2.;
input ID TRANS part_no Quantity;
datalines;
03 1 234 5
03 1 123 9
03 2 237 4
01 1 355 5
01 1 234 3
01 1 123 9
01 2 355 5
02 1 237 11
;
run;
proc sort data=SALES;
by ID;
run;
proc print data=SALES;
run;
data partssales;
merge SALES PARTS;
if part_no=123 then price=15;
if part_no=234 then price=25;
if part_no=237 then price=20;
if part_no=355 then price=28;
if part_no=789 then price=55;
total=Price*Quantity;
drop partno;
run;
/**************************************************/
proc sort data=partssales;
by TRANS;
run;
proc print data=partssales;
sum total;
by TRANS;
run;
proc print data=partssales;
sum total;
by ID;
run;
/**************************************************/
proc sort data=partssales;
by ID;
run;
data alldata;
set partssales;
if ID=01 then gender='F';
if ID=02 then gender='F';
if ID=03 then gender='M';
if ID=04 then gender='F';
run;
proc sort data=alldata;
by gender;
run;
proc print data=alldata;
sum total;
by gender;
run;
Once again, my goal is to reach the same results, without the use of "if then" statements because if this was a larger data set, it would not be practical. Thanks in advance for any advice!
It often helps to have the same variable name in multiple sets for the same purpose: i.e. partno everywhere.
Also noticed that your Parts data set does not have a Partno 237 but does have a 327. A typo perhaps?
This could be done with sequential merges:
data work.EMPLOY; length ID $ 2.; input ID Gender $ DOB MMDDYY10.; format dob mmddyy10.; datalines; 01 F 10/21/46 02 F 09/02/44 03 M 04/23/55 04 F 11/11/38 ; run; proc sort data=work.EMPLOY; by ID; run; data work.SALES; length ID $ 2.; input ID TRANS partno Quantity; datalines; 03 1 234 5 03 1 123 9 03 2 237 4 01 1 355 5 01 1 234 3 01 1 123 9 01 2 355 5 02 1 237 11 ; run; proc sort data=work.sales; by id; run; data work.employ_sales; merge work.employ work.Sales (in=insales) ; by id; if insales; run; data work.PARTS; input partno price; datalines; 123 15 234 25 237 20 355 28 789 55 ; proc sort data=work.PARTS; by partno; run; proc sort data=work.employ_sales; by partno; run; data want; merge work.employ_sales (in=insales) work.PARTS ; by partno; if insales; run;
The data set option (in= var) creates a temporary variable that indicates that a current record is from that data set. This is needed because you only want the employees with sales to get the gender and DOB added to the sales data (drop DOB if not wanted/needed). Similar you would only want to bring in the price information for the Parts that have been sold.
@valarievil wrote:
Once again, my goal is to reach the same results, without the use of "if then" statements because if this was a larger data set, it would not be practical. Thanks in advance for any advice!
If the Gender is specified in data set EMPLOY, there's no need for you to type it in via IF-THEN statements. You just use the Gender specified in EMPLOY. Same thing with the variable Price, no need to type these in via IF-THEN statements. This is a great example where you can just join the data sets all in one PROC SQL step.
/* UNTESTED CODE */
proc sql;
create table all as select s.*,p.price,e.gender
from sales s left join parts p on s.part_no=p.partno
/* Note: you have spelled the variable differently in these data sets */
left join employ e on s.id=e.id;
quit;
It often helps to have the same variable name in multiple sets for the same purpose: i.e. partno everywhere.
Also noticed that your Parts data set does not have a Partno 237 but does have a 327. A typo perhaps?
This could be done with sequential merges:
data work.EMPLOY; length ID $ 2.; input ID Gender $ DOB MMDDYY10.; format dob mmddyy10.; datalines; 01 F 10/21/46 02 F 09/02/44 03 M 04/23/55 04 F 11/11/38 ; run; proc sort data=work.EMPLOY; by ID; run; data work.SALES; length ID $ 2.; input ID TRANS partno Quantity; datalines; 03 1 234 5 03 1 123 9 03 2 237 4 01 1 355 5 01 1 234 3 01 1 123 9 01 2 355 5 02 1 237 11 ; run; proc sort data=work.sales; by id; run; data work.employ_sales; merge work.employ work.Sales (in=insales) ; by id; if insales; run; data work.PARTS; input partno price; datalines; 123 15 234 25 237 20 355 28 789 55 ; proc sort data=work.PARTS; by partno; run; proc sort data=work.employ_sales; by partno; run; data want; merge work.employ_sales (in=insales) work.PARTS ; by partno; if insales; run;
The data set option (in= var) creates a temporary variable that indicates that a current record is from that data set. This is needed because you only want the employees with sales to get the gender and DOB added to the sales data (drop DOB if not wanted/needed). Similar you would only want to bring in the price information for the Parts that have been sold.
Thank you! by the way the different names of the part number was intentional to answer a different question. Also prolly cause I could get my datasets to merge properly.
You have received the solution already.
Without explicit sorting of data sets, we can use hash objects to get your solution with the aggregated sum within Gender. Here is one way of using hash objects. Assuming SALES data set will be huge, place other two data sets into hash tables. As each record from Sales are processed, the (Price * Quantity) can be accumulated by replacing SUM in the hash table (E). Finally, we can write out the contents of E into a WANT data set.
Edited:
data _null_;
if _n_ = 1 then do;
if 0 then set parts;
Sum = 0;
declare hash p(dataset:'Parts');
p.definekey('partno');
p.definedata('price');
p.definedone();
declare hash e ();
e.definekey('ID');
e.definedata('ID','gender','DOB','Sum');
e.definedone();
do until(eof);
set employ end = eof;
e.add();
end;
end;
set sales end = last;
rp = p.find();
re = e.find();
if rp = 0 and re = 0 then do;
Sum + price * quantity;
e.replace();
end;
if last;
e.output(dataset:'want');
run;
The output of WANT is:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.