BookmarkSubscribeRSS Feed
HeatherNewton
Quartz | Level 8
data &Portofo._VAL2(keep=MOB_VAL_P MOB);
set &Portofo._VAL;
by MOB;
if first.MOB then CNT=0;
CNT+1;
MOB_VAL_P=CNT/&total_val;
if LAST.MOB;
run;

does CNT means CNT+1?

what is the line 'by MOB' doing here? what does it do?

 

8 REPLIES 8
Tom
Super User Tom
Super User

This statement

cnt+1;

is a SUM statement.  It adds 1 to the value of CNT.  It also makes sure that the value of CNT is retained across iterations of the data step.  That is it is NOT reset to missing at the start of the next iteration, unlike other variables, such as MOB_VAL_P, that are not sourced from input datasets. 

 

Also when it adds the one it will ignore any missing value of CNT when doing the addition.  (See the SUM() function for more information on this).  Normally when you do arithmetic with any missing values the result will be missing also.  But the SUM() function used by the sum statement just ignores the missing values instead.

 

A BY statement tells SAS to process the dataset by the value of the listed variables.  If the data is not actually sorted by those variables then the data step will throw an error and stop.

 

The use of the BY statement is what creates and populates the FIRST. and LAST. flag variables that are also referenced in the code.  FIRST. is true when it is the first observation for the current value of the BY variable whose name follows the FIRST. (within the current values of all of the variables listed before it on the BY statement).  Similarly the LAST. flag is true only on the last observation of the group.

 

So basically this step is counting how many observations are present for each value of the BY variable and only writing out one observation per value of the BY variable. 

 

It is also calculating MOB_VAL_P by dividing the count by the value of the macro variable that is referenced resolves to.  Note that it makes this calculation on every observation, but since only the last observation for a by group is written the intermediate values of CNT and MOB_VAL_P are not written out.  You could have avoided some extra division operations by moving the assignment statement after the subsetting IF statement.

 

Also instead of using the KEEP= dataset option to set the list of variables that the target dataset has you could just use a plain old normal KEEP statement.

data &Portofo._VAL2;
  set &Portofo._VAL;
  by MOB;
  if first.MOB then CNT=0;
  CNT+1;
  if LAST.MOB;
  MOB_VAL_P=CNT/&total_val;
  keep MOB MOB_VAL_P ;
run;

 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Tom 

 

Sorry to bother you with a question quite unrelated to the problem presented in this post. But I would like to know why you suggest using the plain old KEEP statement instead of the data set option. I wonder if there are any good reasons I have missed, as I always promote the Data Set options, because

 

1. The Keep or Drop variable lists specified as options are visually connected to the data set they relate to.

2. The same syntax can be used in DATA and SET statements, and there are no corresponding INKEEP or INDROP statements available.

3. Keep or Drop statements has effect on all output data sets, so the Data Set options is the only way to specify individual action on different outputs.

 

I recently spend time trying to find a missing output variable caused by a drop statement in line 600-something in a data step. The step had many drop statements placed all over the code, and it makes sense during development to place the drop statement in the section of the code where the variable is created, so the drop is not forgotten. But it is easier to maintain programs with a consistent use of syntax, which calls for the use of Data Set options, because they are unavoidable in some cases. 

 

 

 

 

Tom
Super User Tom
Super User

I follow the KISS principle.  Keep It Simple, Stupid.

 

Dataset options are a complication. Sometimes the complication is useful and so worth it.  The example program is not one of those cases.

 

1) Most data steps only create one output dataset.

2) Not sure how that point relates, especially to this program, or really any normal simple data step. There are LOTS of dataset options.  

3) See (1).

 

LinusH
Tourmaline | Level 20
One example would be if used on datasets on the SET statement, which would make the PDV smaller, hence would make the program use less resources.
Data never sleeps
Tom
Super User Tom
Super User

@LinusH wrote:
One example would be if used on datasets on the SET statement, which would make the PDV smaller, hence would make the program use less resources.

Yes. And that is a totally different use case than the data step in this question.

HeatherNewton
Quartz | Level 8

so where I put the statement 'if last.mob' is crucial..

if I put it above the statement ' if first.mob then cnt=0';', the result would be different only counting the last entry of each mob value, am I correct??

I always thought it does not matter where I put the subsetting condition 'if...' within the dataset, but actually it does matter in this case...

 

 

 

 

 

LinusH
Tourmaline | Level 20
Yes it does matter, subsequent rows will not be executed.
Data never sleeps
Tom
Super User Tom
Super User

@HeatherNewton wrote:

so where I put the statement 'if last.mob' is crucial..

if I put it above the statement ' if first.mob then cnt=0';', the result would be different only counting the last entry of each mob value, am I correct??

I always thought it does not matter where I put the subsetting condition 'if...' within the dataset, but actually it does matter in this case...

 


I suspect you were confusing the subsetting IF statement with the WHERE statement.  The WHERE statement (or WHERE= dataset option) limits the observations that come into that dataset. So it is not executable.  But the positioning of a WHERE statement can matter in a complex data step that has more than one SET (or MERGE or UPDATE) statement.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1503 views
  • 0 likes
  • 4 in conversation