BookmarkSubscribeRSS Feed
DavidPhillips2
Rhodochrosite | Level 12

I have two sets of rows in this table.  The general rows and a redundant total row.  

 

Is there a simple way to do this?  I can think of a few that involve 10 lines of code.

 

data merged_SAL; set merged_SAL;
if CONTRACT_LENGTH = ('9 10 11 12') then do;
SALARY_OUTLAYS = /*sum of salary_outlays where CONTRACT_LENGTH in ('9', '10', '11', '12') */
end;
run;

 

 

8 REPLIES 8
DavidPhillips2
Rhodochrosite | Level 12

Note in an example like this:

data test;
infile cards;
input A1 A2 A3 A4 A5;
cards;
2 2 5 1 0
;
run;

data sums;
 set test;

 /* method 1: array of all numerics */
 array n {*} _numeric_;
 sum_allnum=sum(of n[*]);

 /*method 2: variable range in data set */
 sum_range=sum(of a1-a5);
run;

The author is summing across columns.  I am suming across rows.

SASKiwi
PROC Star

Here is one way with SQL. You haven't really explained your problem completely - it would help if you supplied sample input and output data - so I'm guessing this is close to what you want. If you don't want sums by contract_length take out the GROUP BY.

 

proc sql;
  create table want as
  select A.*
           ,B.salary_outlays_sum
  from merged_sal as A
  left join
  (select contract_length
             ,sum(salary_outlays) as salary_outlays_sum
   from  merged_sal
  where contract_length in ('9','10','11','12')
  group by contract_length
  ) as B
  on A.contract_length = B.contract_length
  ;
quit;
mkeintz
PROC Star

You are summing over rows.  That much is obvious.  Are you saying you want to PRODUCE a redundant ROW, which is the sum of the contract_lengths between 9 and 12?  Or are you producing a new column with row sums for contact lengths 9, 10, 11, and 12, respectively.  I.e. if the individual contract is 9, you want to add a SALARY_OUTLAYS_SUM variable summing all length 9 contracts.  That's what @SASKiwi did in SQL.  Here's how in a data step:

 

data want;
  set sal_merged (in=prelim where=(9<=contract_length<=12))
      sal_merged (in=inkeep);
  array tmpsum {9:12} _temporary_;
  if prelim then tmpsum{contract_length}+salary_outlays;
  if inkeep;
  if (9<=contract_length<=12) then salary_outlays_sum=tmpsum{contract_length};
run;

I assume that contract_length is a numeric variable.  The SET statement reads the data set twice, first to generate sums in a temporary array, and the second to output while retrieve sums from the same array.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
DavidPhillips2
Rhodochrosite | Level 12

Sample data is below:

 

 

 

data sal_merged;

infile datalines;

input identifer contract_length salary_outlays ;

return;

datalines;

1 9 10000

2 9 20000

3 10 10000

4 11 30000

5 12 10000

6 9 20000

7 11 30000

8 10 30000

9 12 20000

10 13 40000

11 13 40000

12 13 20000

;

run;

 

I’m trying to sum the values of contact lengths 9-12 as the value for contract length 13.

DavidPhillips2
Rhodochrosite | Level 12

This works.  Is there a shorter solution?

 

data sal_merged;
infile datalines;
input identifer contract_length salary_outlays gender $1.;
return;
datalines;
1 9 10000 M
2 9 20000 M
3 10 10000 M
4 11 30000 M
5 12 10000 M
6 9 20000 F
7 11 30000 F
8 10 30000 F
9 12 20000 F
10 13 . F
11 13 . M
;
run;

proc sql;
create table tempSalaryOutlays as
select gender
,sum(salary_outlays) as salary_outlays_sum
from sal_merged
where contract_length in (9,10,11,12)
group by gender;

quit;

proc sort data=sal_merged;
by gender;
run;
proc sort data=tempSalaryOutlays;
by gender;
run;

data sal_merged; merge sal_merged (in=x) tempSalaryOutlays;
by gender;
if contract_length = 13 then salary_outlays = salary_outlays_sum;
run;

mkeintz
PROC Star

@DavidPhillips2   Your program is not the shortest, but more importantly, it must have a problem somewhere.  I ran it and it produces a sum of salaries of 80,000 for contract length 9, but I see only 3 observations with contract length 9  (10000, 20000, and 20000).

 

As to shortest, I believe my single data step qualifies.  And now, then I edited it, it actually works.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
DavidPhillips2
Rhodochrosite | Level 12

The total is correct.  Note, the program updates salary outlays.  You can drop the salary_outlays_sum column at the end if you want to.  I left it in for edit check reasons.

 

I modified your program slightly to update contract length 13 only.  I'm trying to figure out how to use your program to do the same split out by gender like the code I posted earlier.  

 

At the end of the below code is a dataset that includes gender.

 

data sal_merged;

infile datalines;

input identifer contract_length salary_outlays ;

return;

datalines;

1 9 10000

2 9 20000

3 10 10000

4 11 30000

5 12 10000

6 9 20000

7 11 30000

8 10 30000

9 12 20000

10 13 40000

11 13 40000

12 13 20000

;

run;

data want;
set sal_merged (in=prelim where=(9<=contract_length<=12))
sal_merged (in=inkeep);
array tmpsum {9:12} _temporary_;
if prelim then tmpsum{contract_length}+salary_outlays;
if inkeep;
if (contract_length=13) then salary_outlays=tmpsum{9} + tmpsum{10} + tmpsum{11} +tmpsum{12};
run;


data sal_merged;
infile datalines;
input identifer contract_length salary_outlays gender $1.;
return;
datalines;
1 9 10000 M
2 9 20000 M
3 10 10000 M
4 11 30000 M
5 12 10000 M
6 9 20000 F
7 11 30000 F
8 10 30000 F
9 12 20000 F
10 13 . F
11 13 . M
;

mkeintz
PROC Star

@DavidPhillips2   Ah, it was my misinterpretation.  Thanks.

 

As to modification for gender ... If the dataset is sorted by gender, you can utilize the BY statement (untested):

 

data want;

  set salmerged (where=(9<=contract_length<=12) in=prelim)

        salmerged;

  by descending gender;

  array tmpsum{9:12} _temporary_;

  if first.gender then call missing(of tmpsum{*});

 

  if prelim then tmpsum{contract_length}+salary_outlays;

  if inkeep;
  if (contract_length=13) then salary_outlays=sum(of tmpsum{*});
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1015 views
  • 0 likes
  • 3 in conversation