BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pdkc494949
Calcite | Level 5
data example;
      set example1;
     retain _obs 0;
     by site;
     if first.site then _obs = 0;
     _obs + 1;
run;

According to the code above, I am supposed to get an output dataset something like this:

site  _obs

100     1

100.    2

100.    3

200.    1

200.    2

300.    1

300.    2

300.    3

300.    4

 

But instead, I got an output without a proper order:

site  _obs

100     1

100.    2

100.    3

200.    1

200.    2

300.    3

300.    4

300.    5

300.    6

Supposedly, the variable _obs should reset to zero(0) when the first.site encounters, but it did not. I am not sure what's the problem with this. Is this a syntax error or a flaw in SAS? 

I would really appreciate your help. 

 

Thank you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Based on a few of your posts put together, here's what happened and why removing the IF statement worked.

 

The program accurately created the BY variables.  However, the order of the statements made a difference because IF deleted observations.  More specifically, it deleted observations before getting to the logic about resetting _OBS.  So that part never executed on observations that got deleted.

 

If you don't have a WHERE statement in your DATA step already, that would be the simple solution.  Change this:

if vistdat le &cutdate;

to this:

where vistdat le &cutdate;

The WHERE statement subsets differently than IF.  When using IF, the DATA step reads in observations then deletes some of them. When using WHERE instead, the DATA step reads in only the observations that meet the WHERE condition.  As a result, the BY variables get set differently (and properly for your purposes).

View solution in original post

9 REPLIES 9
mklangley
Lapis Lazuli | Level 10

The code you posted gives seems to work, and makes sense. (Try running what I have below.) Perhaps double-check that you ran the same thing. If you're still seeing a different result, please include the code that produced the output you were seeing.

data example1;
    input site $;
    datalines;
    100
    100
    100
    200
    200
    300
    300
    300
    300
    ;
run;

data example;
     set example1;
     retain _obs 0;
     by site;
     if first.site then _obs = 0;
     _obs + 1;
run;
pdkc494949
Calcite | Level 5
Thank you for your reply. My actual input data is quite long and complicated. The data I wrote in the description is a simplified version of it. I'll get back to you if I can copy my actual data in here.
pdkc494949
Calcite | Level 5

I'm sorry. I forgot adding one statement in the data step. 

I'm not sure if it would play an important role in distorting my output. 

if vistdat le &cutdate;
mklangley
Lapis Lazuli | Level 10
Hmm. Hard to say without seeing your code or data. Could you include those?
pdkc494949
Calcite | Level 5

I actually resolved the issue by removing the if statement which subsets the dataset by date. 

Anyways, thank you for your help! 🙂 

ballardw
Super User

Is there any chance that your example1 data set already has a variable named _obs in it?

When the variable is in the source data set retain does not do quite what you expect because the value from the next record is read into the variable in the source data.

Consider:

data example1;
    input site $ _obs;
    datalines;
    100  1
    100  1
    100  1
    200  1
    200  1
    300  1
    300  1
    300  1
    300  1
    ;
run;

data example;
     set example1;
     retain _obs 0 newobs 0;
     by site;
     if first.site then do;
         _obs = 0;
         newobs=0;
      end;
     _obs + 1;
     newobs+1;
run;

_obs and newobs behave quite differently.

pdkc494949
Calcite | Level 5
my source data does not have _obs or newobs in it. Thanks
Astounding
PROC Star

Based on a few of your posts put together, here's what happened and why removing the IF statement worked.

 

The program accurately created the BY variables.  However, the order of the statements made a difference because IF deleted observations.  More specifically, it deleted observations before getting to the logic about resetting _OBS.  So that part never executed on observations that got deleted.

 

If you don't have a WHERE statement in your DATA step already, that would be the simple solution.  Change this:

if vistdat le &cutdate;

to this:

where vistdat le &cutdate;

The WHERE statement subsets differently than IF.  When using IF, the DATA step reads in observations then deletes some of them. When using WHERE instead, the DATA step reads in only the observations that meet the WHERE condition.  As a result, the BY variables get set differently (and properly for your purposes).

pdkc494949
Calcite | Level 5

Thanks, It works!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 6092 views
  • 1 like
  • 4 in conversation