BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
scify
Obsidian | Level 7

I'm having an odd problem working with some data at work. I have the following code

data compare;set set1 (in=a) set2 (in=b);
	if b=1 then do;

		if points in (0 1) then newcat=1;
		if points in (2 3) then newcat=2;
		if points in (4 5 6) then newcat=3;
		if points in (7 8 9) then newcat=4;
		if points in (10 11 12) then newcat=5;
		if points>=13 then newcat=6;

		if source=1 and applied=1 and gdlinehi ne '' and table not in (1 2 3) then do;
			final=career+carmod;
			guide=level+modifier;
		end;
	end;
run;

Which seems fairly straightforward. However, when I run a check of the final and guide variables, I find that they exist for all of the observations in the data set, rather than the somewhat small subset that should have been selected by the IF statement. Through some testing, I've discovered that if I remove the first data set from the SET statement (so it reads as set set2 (in=b); ) the IF restrictions seem to work just fine. There's no crossover between the two data sets; I've checked via both PROC COMPARE and using the IN= variables.

 

Any idea what's going on here?

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

The symptoms yiou describe would occur of FINAL and GUIDE were part of SET1.

 

One easy solution would be to add just before the end of the DO block:

 

else do;

   final=.;

   guide=.;

end;

View solution in original post

12 REPLIES 12
dcruik
Lapis Lazuli | Level 10

Are you trying to append set1 and set2, or merge them?  Right now it seems you're trying to append the data sets.  I believe you need to modify your set statement to a merge statement if you are trying to merge the two data sets.  Also, just a suggestion to increase efficiency so you aren't reading every row of data with each of the if statements, but I would consider nesting them (for example change if then; if then to if then; else if then; else;).  Other than that, I don't see why the if statements in the do statement would be having an issue with the code you've provided.

ballardw
Super User

Are you saying you expect some VARIABLES not to exist for some records? Or that you have values for records you don't expect?

 

Rick_SAS
SAS Super FREQ

What do you mean when you say that the FINAL and GUIDE variables "exist for all of the observations in the data set"?

Your code will create new variables FINAL and GUIDE.  For some observations you are setting values for these variables, for others you are not.  But the variables exist for all observations.

 

For observations in the first data set, the FINAL and GUIDE variables will have missing values. Observations in the second data set will be assigned a value or will be missing.

 

What did you expect to happen?

scify
Obsidian | Level 7

"For observations in the first data set, the FINAL and GUIDE variables will have missing values. Observations in the second data set will be assigned a value or will be missing."

 

This is what should be happening, yes. However, I find that FINAL and GUIDE have a non-missing value for every observation from the second data set, rather than the subset indicated by the IF statement.

Astounding
PROC Star

As Rick mentioned, every variable exists on every observation.  That is always true of every SAS data set.  However, it is possible for a variable to have a missing value on some observations.

 

However ... if you are saying that some observations from SET1 have nonmissing values for FINAL and GUIDE ... that is possible but tricky.  It could happen if SET1 contains a variable named B.  Then you might encounter B=1 for some observations from SET1.

scify
Obsidian | Level 7

"However ... if you are saying that some observations from SET1 have nonmissing values for FINAL and GUIDE ... that is possible but tricky.  It could happen if SET1 contains a variable named B.  Then you might encounter B=1 for some observations from SET1."

It's more that all 75,000+ observations from SET2 have a non-missing value for FINAL and GUIDE, rather than the 2,269 that fit the IF statement criteria.

Astounding
PROC Star

Well, the obvious (probably wrong) possibility is that FINAL and GUIDE are part of SET2 originally.  The tricky (and probably correct) answer is that FINAL and GUIDE are part of SET1.  All variables that come from a SAS data set are automatically retained (whether they come from SET1 or SET2).  So when B=1 you would be replacing FINAL and GUIDE, but when B=(anything else) you would see the same value repeating from the most recent observation that had B=1.

Rick_SAS
SAS Super FREQ

Another guess: There could be statements in your code that you are not showing us and that are important.  For example, if your DATA step has a RETAIN statement such as

    retain final guide;

then that would explain the behaviour that you report.

scify
Obsidian | Level 7

Nope! I originally had more going on, but stripped it down to just what you see here before coming to the forums for help.

scify
Obsidian | Level 7

The FINAL and GUIDE variables don't exist on SET2 (which is why if I run this code on a data step with SET2, this doesn't happen). As mentioned in the original post, there is no overlap between the two data sets (they're from different years and have entirely different sets of observations). Also, as a test, I added in lines setting FINAL and GUIDE equal to missing at the beginning of the DO block (to see if there was some sort of unexpected carryover from observation to observation). This did not help.

Astounding
PROC Star

The symptoms yiou describe would occur of FINAL and GUIDE were part of SET1.

 

One easy solution would be to add just before the end of the DO block:

 

else do;

   final=.;

   guide=.;

end;

FreelanceReinh
Jade | Level 19

Please note that "setting FINAL and GUIDE equal to missing at the beginning of the DO block" is useless, because:

  • If the IF condition is met, FINAL and GUIDE are overwritten with non-missing values immediately.
  • If it is not met, FINAL and GUIDE may very well retain their non-missing values from the last observation where the condition was met, as @Astounding has pointed out. The important question (raised by @Astounding) is: Does SET1 contain variables named FINAL and GUIDE?

Last week there was a discussion about exactly this type of issue. Please see this thread.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2192 views
  • 4 likes
  • 6 in conversation