data example;
set example1;
retain _obs 0;
by site;
if first.site then _obs = 0;
_obs + 1;
run;
According to the code above, I am supposed to get an output dataset something like this:
site _obs
100 1
100. 2
100. 3
200. 1
200. 2
300. 1
300. 2
300. 3
300. 4
But instead, I got an output without a proper order:
site _obs
100 1
100. 2
100. 3
200. 1
200. 2
300. 3
300. 4
300. 5
300. 6
Supposedly, the variable _obs should reset to zero(0) when the first.site encounters, but it did not. I am not sure what's the problem with this. Is this a syntax error or a flaw in SAS?
I would really appreciate your help.
Thank you.
Based on a few of your posts put together, here's what happened and why removing the IF statement worked.
The program accurately created the BY variables. However, the order of the statements made a difference because IF deleted observations. More specifically, it deleted observations before getting to the logic about resetting _OBS. So that part never executed on observations that got deleted.
If you don't have a WHERE statement in your DATA step already, that would be the simple solution. Change this:
if vistdat le &cutdate;
to this:
where vistdat le &cutdate;
The WHERE statement subsets differently than IF. When using IF, the DATA step reads in observations then deletes some of them. When using WHERE instead, the DATA step reads in only the observations that meet the WHERE condition. As a result, the BY variables get set differently (and properly for your purposes).
The code you posted gives seems to work, and makes sense. (Try running what I have below.) Perhaps double-check that you ran the same thing. If you're still seeing a different result, please include the code that produced the output you were seeing.
data example1;
input site $;
datalines;
100
100
100
200
200
300
300
300
300
;
run;
data example;
set example1;
retain _obs 0;
by site;
if first.site then _obs = 0;
_obs + 1;
run;
I'm sorry. I forgot adding one statement in the data step.
I'm not sure if it would play an important role in distorting my output.
if vistdat le &cutdate
;
I actually resolved the issue by removing the if statement which subsets the dataset by date.
Anyways, thank you for your help! 🙂
Is there any chance that your example1 data set already has a variable named _obs in it?
When the variable is in the source data set retain does not do quite what you expect because the value from the next record is read into the variable in the source data.
Consider:
data example1; input site $ _obs; datalines; 100 1 100 1 100 1 200 1 200 1 300 1 300 1 300 1 300 1 ; run; data example; set example1; retain _obs 0 newobs 0; by site; if first.site then do; _obs = 0; newobs=0; end; _obs + 1; newobs+1; run;
_obs and newobs behave quite differently.
Based on a few of your posts put together, here's what happened and why removing the IF statement worked.
The program accurately created the BY variables. However, the order of the statements made a difference because IF deleted observations. More specifically, it deleted observations before getting to the logic about resetting _OBS. So that part never executed on observations that got deleted.
If you don't have a WHERE statement in your DATA step already, that would be the simple solution. Change this:
if vistdat le &cutdate;
to this:
where vistdat le &cutdate;
The WHERE statement subsets differently than IF. When using IF, the DATA step reads in observations then deletes some of them. When using WHERE instead, the DATA step reads in only the observations that meet the WHERE condition. As a result, the BY variables get set differently (and properly for your purposes).
Thanks, It works!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.