BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
GeorgeBonanza
Quartz | Level 8

Below is a simple example of a more complex task I am trying to accomplish.  What I want to do is read through all the data and accumulate AMT grouped by product using a hash object and output the hash object as a dataset once the end of the input is reached.  I want to only include products A and B and need to do this with a subsetting IF statement.  The two input datasets, HAVE1 and HAVE2 are identical except for the order of the records.  I get the output I want when I use HAVE1 but not with HAVE2 and I think it is because the last record in HAVE2 doesn't satisfy the subsetting IF statement.  I am sure there is a simple solution, I am just not seeing it.  Any insight is appreciated.  Thanks in advance.

data work.HAVE1;
	input PRODUCT $ AMT;
	datalines;
A 100
A 200
B 500
C 1000
B 800
;
run;

data work.HAVE2;
	input PRODUCT $ AMT;
	datalines;
A 100
A 200
B 500
B 800
C 1000
;
run;

data _null_;
	if _n_ = 1 then do;
		declare hash tot();
		tot.definekey ("PRODUCT");
		tot.definedata ("PRODUCT", "TOTAL");
		tot.definedone();
	end;
	do until (EOF);
		set work.HAVE1 end= EOF;

		if PRODUCT in ("A","B");

		if tot.find() ne 0 then call MISSING(TOTAL);
		TOTAL + AMT;
		tot.replace();
	end;
	tot.output(dataset: "work.WANT");
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

This is good example of a program that violates a top down structure.  You have a DO loop where the process flow does not always enter from the top and exit from the bottom.  Instead because of the subsetting IF statement sometimes the process jumps out of the DO loop and starts another iteration of the data step.

 

If you wan to keep the subsetting IF move the hash output to BEFORE the DO loop.

if eof then tot.output(dataset: "work.WANT");
do until( eof );
  ...

In this case there is no need for a subsetting if.  You could convert to a WHERE statement instead.

where PRODUCT in ("A","B");

or an IF/THEN block.

if PRODUCT in ("A","B") then do;
  if tot.find() ne 0 then call MISSING(TOTAL);
  TOTAL + AMT;
  tot.replace();
end;

Once you eliminate jumping out of the DO loop you can simplify the program.  You no longer need the IF _N_=1 test since there can only be on iteration.

data _null_;
  declare hash tot();
  tot.definekey ("PRODUCT");
  tot.definedata ("PRODUCT", "TOTAL");
  tot.definedone();
  do until (EOF);
    set work.HAVE1 end= EOF;
    where PRODUCT in ("A","B");
    if tot.find() ne 0 then call MISSING(TOTAL);
    TOTAL + AMT;
    tot.replace();
  end;
  tot.output(dataset: "work.WANT");
  stop;
run;

 

View solution in original post

4 REPLIES 4
Astounding
PROC Star

You have correctly diagnosed the problem.  When the subsetting IF deletes the final observation, the program never gets to the test for the END= variable.  A simple solution would be to replace these statements:

		set work.HAVE1 end= EOF;
		if PRODUCT in ("A","B");

Instead, switch from IF to WHERE:

		set work.HAVE1 (where=(PRODUCT in ("A", "B"))) end= EOF; 

 

GeorgeBonanza
Quartz | Level 8

Thank you for taking the time to respond. In my actual task, I am reading from multiple input datasets, i.e. always the most recent 5 years, so I am trying to make the program dynamic. The challenge is that in the current year the rule for identifying the records to keep has changed and there are fields on the 2022 dataset that are not on the 2018-2021 datasets. That is why I am using the subsetting IF statement and not a WHERE dataset option.

Tom
Super User Tom
Super User

This is good example of a program that violates a top down structure.  You have a DO loop where the process flow does not always enter from the top and exit from the bottom.  Instead because of the subsetting IF statement sometimes the process jumps out of the DO loop and starts another iteration of the data step.

 

If you wan to keep the subsetting IF move the hash output to BEFORE the DO loop.

if eof then tot.output(dataset: "work.WANT");
do until( eof );
  ...

In this case there is no need for a subsetting if.  You could convert to a WHERE statement instead.

where PRODUCT in ("A","B");

or an IF/THEN block.

if PRODUCT in ("A","B") then do;
  if tot.find() ne 0 then call MISSING(TOTAL);
  TOTAL + AMT;
  tot.replace();
end;

Once you eliminate jumping out of the DO loop you can simplify the program.  You no longer need the IF _N_=1 test since there can only be on iteration.

data _null_;
  declare hash tot();
  tot.definekey ("PRODUCT");
  tot.definedata ("PRODUCT", "TOTAL");
  tot.definedone();
  do until (EOF);
    set work.HAVE1 end= EOF;
    where PRODUCT in ("A","B");
    if tot.find() ne 0 then call MISSING(TOTAL);
    TOTAL + AMT;
    tot.replace();
  end;
  tot.output(dataset: "work.WANT");
  stop;
run;

 

GeorgeBonanza
Quartz | Level 8
Thank you Tom. Your responses are always detailed and usually contain multiple ways to solve the problem. I always learn something from your solutions. I appreciate you taking the time to respond.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1605 views
  • 1 like
  • 3 in conversation