Below is a simple example of a more complex task I am trying to accomplish. What I want to do is read through all the data and accumulate AMT grouped by product using a hash object and output the hash object as a dataset once the end of the input is reached. I want to only include products A and B and need to do this with a subsetting IF statement. The two input datasets, HAVE1 and HAVE2 are identical except for the order of the records. I get the output I want when I use HAVE1 but not with HAVE2 and I think it is because the last record in HAVE2 doesn't satisfy the subsetting IF statement. I am sure there is a simple solution, I am just not seeing it. Any insight is appreciated. Thanks in advance.
data work.HAVE1;
input PRODUCT $ AMT;
datalines;
A 100
A 200
B 500
C 1000
B 800
;
run;
data work.HAVE2;
input PRODUCT $ AMT;
datalines;
A 100
A 200
B 500
B 800
C 1000
;
run;
data _null_;
if _n_ = 1 then do;
declare hash tot();
tot.definekey ("PRODUCT");
tot.definedata ("PRODUCT", "TOTAL");
tot.definedone();
end;
do until (EOF);
set work.HAVE1 end= EOF;
if PRODUCT in ("A","B");
if tot.find() ne 0 then call MISSING(TOTAL);
TOTAL + AMT;
tot.replace();
end;
tot.output(dataset: "work.WANT");
run;
This is good example of a program that violates a top down structure. You have a DO loop where the process flow does not always enter from the top and exit from the bottom. Instead because of the subsetting IF statement sometimes the process jumps out of the DO loop and starts another iteration of the data step.
If you wan to keep the subsetting IF move the hash output to BEFORE the DO loop.
if eof then tot.output(dataset: "work.WANT");
do until( eof );
...
In this case there is no need for a subsetting if. You could convert to a WHERE statement instead.
where PRODUCT in ("A","B");
or an IF/THEN block.
if PRODUCT in ("A","B") then do;
if tot.find() ne 0 then call MISSING(TOTAL);
TOTAL + AMT;
tot.replace();
end;
Once you eliminate jumping out of the DO loop you can simplify the program. You no longer need the IF _N_=1 test since there can only be on iteration.
data _null_;
declare hash tot();
tot.definekey ("PRODUCT");
tot.definedata ("PRODUCT", "TOTAL");
tot.definedone();
do until (EOF);
set work.HAVE1 end= EOF;
where PRODUCT in ("A","B");
if tot.find() ne 0 then call MISSING(TOTAL);
TOTAL + AMT;
tot.replace();
end;
tot.output(dataset: "work.WANT");
stop;
run;
You have correctly diagnosed the problem. When the subsetting IF deletes the final observation, the program never gets to the test for the END= variable. A simple solution would be to replace these statements:
set work.HAVE1 end= EOF;
if PRODUCT in ("A","B");
Instead, switch from IF to WHERE:
set work.HAVE1 (where=(PRODUCT in ("A", "B"))) end= EOF;
Thank you for taking the time to respond. In my actual task, I am reading from multiple input datasets, i.e. always the most recent 5 years, so I am trying to make the program dynamic. The challenge is that in the current year the rule for identifying the records to keep has changed and there are fields on the 2022 dataset that are not on the 2018-2021 datasets. That is why I am using the subsetting IF statement and not a WHERE dataset option.
This is good example of a program that violates a top down structure. You have a DO loop where the process flow does not always enter from the top and exit from the bottom. Instead because of the subsetting IF statement sometimes the process jumps out of the DO loop and starts another iteration of the data step.
If you wan to keep the subsetting IF move the hash output to BEFORE the DO loop.
if eof then tot.output(dataset: "work.WANT");
do until( eof );
...
In this case there is no need for a subsetting if. You could convert to a WHERE statement instead.
where PRODUCT in ("A","B");
or an IF/THEN block.
if PRODUCT in ("A","B") then do;
if tot.find() ne 0 then call MISSING(TOTAL);
TOTAL + AMT;
tot.replace();
end;
Once you eliminate jumping out of the DO loop you can simplify the program. You no longer need the IF _N_=1 test since there can only be on iteration.
data _null_;
declare hash tot();
tot.definekey ("PRODUCT");
tot.definedata ("PRODUCT", "TOTAL");
tot.definedone();
do until (EOF);
set work.HAVE1 end= EOF;
where PRODUCT in ("A","B");
if tot.find() ne 0 then call MISSING(TOTAL);
TOTAL + AMT;
tot.replace();
end;
tot.output(dataset: "work.WANT");
stop;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.