Hello SAS community,
I have a table like this :
obs position Level HC
1 1 1 1
2 2 2 1
3 3 3 2
4 4 4 1
5 5 2 1
6 6 3 1
7 7 4 1
8 8 4 2
I want to calculate by position the sum of head count under that position based on the hierarchy level seen in the column Level, like this :
obs position Level HC HC for position under
1 1 1 1 9
2 2 2 1 3 (sum of headcount until reach the same level without crossing lower value of column Level)
3 3 3 2 1
4 4 4 1 0 (lowest level so 0 position under, but the lowest level could change (could become 5, 6 or more))
5 5 2 1 4
6 6 3 1 1
7 7 4 1 0
8 8 4 2 0
I have tried multiple combination of looping and condition but I can't seem to wrap my head around the solution.
Can someone help me on this?
Thank you so much SAS community
The interesting parts of this problem are that the sums of each row are calculated based on succeeding rows and each individual headcount (hc) may be contributing to multiple sums at various levels; for example, the headcount at position 4 contributes to the sums for levels 1, 2 and 3.
My solution involves reading the dataset twice in the same data step. The first set establishes the current position and level, and the 2nd set determines which rows are summed based on the conditions given by the OP.
data have;
infile datalines;
input _obs position level hc expected_hc_sum;
drop _:;
datalines;
1 1 1 1 9
2 2 2 1 3
3 3 3 2 1
4 4 4 1 0
5 5 2 1 4
6 6 3 1 3
7 7 4 1 0
8 8 4 2 0
;
run;
data want;
set have; * read have;
calc_hc_sum = 0; * initialize calculated HC sum;
* loop through dataset have again independently to sum HC based on 2 conditions: ;
* 1) current position in 1st set < position in 2nd set.
* 2) current level in 1st set < level in 2nd set.;
do i = 1 to nobs;
set have (keep=position level hc rename=(position=_pos level=_lev hc=_hc))
nobs=nobs point=i;
if position < _pos then do; * only consider obs > curent position;
if level < _lev then calc_hc_sum + _hc; * only sum obs > current level;
else leave; * exit loop when encounter the same or higher level (smaller number);
end;
end;
drop _:;
run;
I thought I was following the logic until I got to obs. 6. Why is the HC 1 and not 3?
The interesting parts of this problem are that the sums of each row are calculated based on succeeding rows and each individual headcount (hc) may be contributing to multiple sums at various levels; for example, the headcount at position 4 contributes to the sums for levels 1, 2 and 3.
My solution involves reading the dataset twice in the same data step. The first set establishes the current position and level, and the 2nd set determines which rows are summed based on the conditions given by the OP.
data have;
infile datalines;
input _obs position level hc expected_hc_sum;
drop _:;
datalines;
1 1 1 1 9
2 2 2 1 3
3 3 3 2 1
4 4 4 1 0
5 5 2 1 4
6 6 3 1 3
7 7 4 1 0
8 8 4 2 0
;
run;
data want;
set have; * read have;
calc_hc_sum = 0; * initialize calculated HC sum;
* loop through dataset have again independently to sum HC based on 2 conditions: ;
* 1) current position in 1st set < position in 2nd set.
* 2) current level in 1st set < level in 2nd set.;
do i = 1 to nobs;
set have (keep=position level hc rename=(position=_pos level=_lev hc=_hc))
nobs=nobs point=i;
if position < _pos then do; * only consider obs > curent position;
if level < _lev then calc_hc_sum + _hc; * only sum obs > current level;
else leave; * exit loop when encounter the same or higher level (smaller number);
end;
end;
drop _:;
run;
data have;
infile datalines;
input _obs position level hc ;
drop _:;
datalines;
1 1 1 1 9
2 2 2 1 3
3 3 3 2 1
4 4 4 1 0
5 5 2 1 4
6 6 3 1 3
7 7 4 1 0
8 8 4 2 0
;
run;
data want;
if _n_=1 then do;
if 0 then set have(rename=(level=_level hc=_hc));
declare hash h(dataset:'have(rename=(level=_level hc=_hc))',ordered:'yes');
declare hiter hi('h');
h.definekey('position');
h.definedata('_level','_hc');
h.definedone();
end;
set have;
want=0;
rc=hi.setcur(key:position+1);
do while(rc=0);
if level=_level then leave;
if level>_level then do;rc=hi.next();continue;end;
want+_hc;
rc=hi.next();
end;
drop rc _:;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.