BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
valarievil
Obsidian | Level 7

Hello! I have three data sets. I would like to produce a table showing total sales by gender. I was able to get the correct results, but I'm concerned that If I had a larger data set, I can't use if statements. Here's my data so far:

 

data EMPLOY;
length ID $ 2.;
input ID Gender $ DOB MMDDYY10.;
datalines;
01 F 10/21/46
02 F 09/02/44
03 M 04/23/55
04 F 11/11/38
;
run;
proc sort data=EMPLOY;
by ID;
run;
proc print data=EMPLOY;
format DOB MMDDYY10.;
run;
data PARTS;
input partno price;
datalines;
123 15
234 25
327 20
355 28
789 55
;
proc print data=PARTS;
run;
data SALES;
length ID $ 2.;
input ID TRANS part_no Quantity;
datalines;
03 1 234 5
03 1 123 9
03 2 237 4
01 1 355 5
01 1 234 3
01 1 123 9
01 2 355 5
02 1 237 11
;
run;
proc sort data=SALES;
by ID;
run;
proc print data=SALES;
run;
data partssales;
merge SALES PARTS;
if part_no=123 then price=15;
if part_no=234 then price=25;
if part_no=237 then price=20;
if part_no=355 then price=28;
if part_no=789 then price=55;
total=Price*Quantity;
drop partno;
run;
/**************************************************/
proc sort data=partssales;
by TRANS;
run;
proc print data=partssales;
sum total;
by TRANS;
run;
proc print data=partssales;
sum total;
by ID;
run;
/**************************************************/
proc sort data=partssales;
by ID;
run;
data alldata;
set partssales;
if ID=01 then gender='F';
if ID=02 then gender='F';
if ID=03 then gender='M';
if ID=04 then gender='F';
run;
proc sort data=alldata;
by gender;
run;
proc print data=alldata;
sum total;
by gender;
run;

Once again, my goal is to reach the same results, without the use of "if then" statements because if this was a larger data set, it would not be practical. Thanks in advance for any advice!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

It often helps to have the same variable name in multiple sets for the same purpose: i.e. partno everywhere.

 

Also noticed that your Parts data set does not have a Partno 237 but does have a 327. A typo perhaps?

 

This could be done with sequential merges:

data work.EMPLOY;
   length ID $ 2.;
   input ID Gender $ DOB MMDDYY10.;
   format dob mmddyy10.;
datalines;
01 F 10/21/46
02 F 09/02/44
03 M 04/23/55
04 F 11/11/38
;
run;
proc sort data=work.EMPLOY;
   by ID;
run;
data work.SALES;
   length ID $ 2.;
   input ID TRANS partno Quantity;
datalines;
03 1 234 5
03 1 123 9
03 2 237 4
01 1 355 5
01 1 234 3
01 1 123 9
01 2 355 5
02 1 237 11
;
run;
proc sort data=work.sales;
   by id;
run;

data work.employ_sales;
   merge work.employ 
         work.Sales (in=insales)
   ;
   by id;
   if insales;
run;

data work.PARTS;
input partno price;
datalines;
123 15
234 25
237 20
355 28
789 55
;
proc sort data=work.PARTS;
   by partno;
run;
proc sort data=work.employ_sales;
   by partno;
run;

data want;
   merge work.employ_sales (in=insales)
         work.PARTS 
   ;
   by partno;
   if insales;
run;

The data set option (in= var) creates a temporary variable that indicates that a current record is from that data set. This is needed because you only want the employees with sales to get the gender and DOB added to the sales data (drop DOB if not wanted/needed). Similar you would only want to bring in the price information for the Parts that have been sold.

 

 

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

@valarievil wrote:

Once again, my goal is to reach the same results, without the use of "if then" statements because if this was a larger data set, it would not be practical. Thanks in advance for any advice!


If the Gender is specified in data set EMPLOY, there's no need for you to type it in via IF-THEN statements. You just use the Gender specified in EMPLOY. Same thing with the variable Price, no need to type these in via IF-THEN statements. This is a great example where you can just join the data sets all in one PROC SQL step.

 

/* UNTESTED CODE */

proc sql;
    create table all as select s.*,p.price,e.gender
    from sales s left join parts p on s.part_no=p.partno
    /* Note: you have spelled the variable differently in these data sets */
    left join employ e on s.id=e.id;
quit;
--
Paige Miller
ballardw
Super User

It often helps to have the same variable name in multiple sets for the same purpose: i.e. partno everywhere.

 

Also noticed that your Parts data set does not have a Partno 237 but does have a 327. A typo perhaps?

 

This could be done with sequential merges:

data work.EMPLOY;
   length ID $ 2.;
   input ID Gender $ DOB MMDDYY10.;
   format dob mmddyy10.;
datalines;
01 F 10/21/46
02 F 09/02/44
03 M 04/23/55
04 F 11/11/38
;
run;
proc sort data=work.EMPLOY;
   by ID;
run;
data work.SALES;
   length ID $ 2.;
   input ID TRANS partno Quantity;
datalines;
03 1 234 5
03 1 123 9
03 2 237 4
01 1 355 5
01 1 234 3
01 1 123 9
01 2 355 5
02 1 237 11
;
run;
proc sort data=work.sales;
   by id;
run;

data work.employ_sales;
   merge work.employ 
         work.Sales (in=insales)
   ;
   by id;
   if insales;
run;

data work.PARTS;
input partno price;
datalines;
123 15
234 25
237 20
355 28
789 55
;
proc sort data=work.PARTS;
   by partno;
run;
proc sort data=work.employ_sales;
   by partno;
run;

data want;
   merge work.employ_sales (in=insales)
         work.PARTS 
   ;
   by partno;
   if insales;
run;

The data set option (in= var) creates a temporary variable that indicates that a current record is from that data set. This is needed because you only want the employees with sales to get the gender and DOB added to the sales data (drop DOB if not wanted/needed). Similar you would only want to bring in the price information for the Parts that have been sold.

 

 

valarievil
Obsidian | Level 7

Thank you! by the way the different names of the part number was intentional to answer a different question. Also prolly cause I could get my datasets to merge properly.

KachiM
Rhodochrosite | Level 12

@valarievil 

 

You have received the solution already.

 

Without explicit sorting of data sets, we can use hash objects to get your solution with the aggregated sum within Gender. Here is one way of using hash objects. Assuming SALES data set will be huge, place other two data sets into hash tables. As each record from Sales are processed, the  (Price * Quantity) can be accumulated by replacing SUM in the hash table (E). Finally, we can write out the contents of E into a WANT data set.

 

Edited:

data _null_; if _n_ = 1 then do; if 0 then set parts; Sum = 0; declare hash p(dataset:'Parts'); p.definekey('partno'); p.definedata('price'); p.definedone(); declare hash e (); e.definekey('ID'); e.definedata('ID','gender','DOB','Sum'); e.definedone(); do until(eof); set employ end = eof; e.add(); end; end; set sales end = last; rp = p.find(); re = e.find();
if rp = 0 and re = 0 then do; Sum + price * quantity; e.replace();
end; if last; e.output(dataset:'want'); run;

 

 

 

The output of WANT is:

 

Capture_01.JPG

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 811 views
  • 0 likes
  • 4 in conversation