BookmarkSubscribeRSS Feed
jc3992
Pyrite | Level 9

Hello everyone,

this topic was from my assignment, 

and after checking the answer provided,

I still had this question.

 

The data-set is as below:

The labels are:

Month Date Team Hits Runs Status

6-19 Columbia Peaches      8  3 Complete
6-20 Columbia Peaches     10  5 Complete
6-23 Plains Peanuts        3  4 Complete
6-24 Plains Peanuts        7  2 Complete
6-25 Plains Peanuts       12  8 Complete
6-30 Gilroy Garlics        .  . No Data
7-1  Gilroy Garlics        .  . No Data
7-4  Sacramento Tomatoes  15  9 Complete
7-4  Sacramento Tomatoes  10 10 Complete
7-5  Sacramento Tomatoes   2  3 Complete

The question was:

  1. You want to accumulate the maximum number of runs that you know about. In this case, for example, record 6 should list MaxRuns=8.
  2. You only want to accumulated the total number of runs to date until you don’t have information—when this happens you want to set RunsToDate to a missing value. In this case, for example, record 6 should list RunsToDate=.;

The code was as below:

data mydata;
infile "&dirdata/Week_5/Games_Plus.dat" truncover;
input Month 1 Day 3-4 Team $6-24 Hits 27-28 Runs 30-31 Status $9.;
retain MaxRuns RunstoDate 0 ;
MaxRuns=Max(MaxRuns, Runs);
RunsToDate=RunsToDate+Runs;
run;

proc print data=mydata;
title "Season's Record to Date, with Missing Values";
run;

My question was the "Retain" command:

Starting from here, I had not yet created variables named "MaxRuns" and "RunsToDate".

However, it seemed SAS knows this.

And I also did not understand MaxRuns: why did it state as Max(MaxRuns,Runs) instead of simply MaxRuns=Max(Runs)

And I think "RunsToDate=RunsToDate+Runs" is because it is like RunsToDate of record 3=RunsToDate of record 2 +Runs of record 3 and so on...

 

I guess I did not really understand about this question,

I wonder if anyone understand this and would like to guiding me a little bit.

Thanks a lot!:)

7 REPLIES 7
Kurt_Bremser
Super User

Contrary to SQL, where a summary function like max() can work over all rows, a data step (and a data step function) always deals with the current observation only. So you need to compare the current value with the retained summary value.

jc3992
Pyrite | Level 9

Thank you very much! Now I understand. Thanks~

Kurt_Bremser
Super User

The retained variables are not stored in a different section of memory, they are part of the PDV like all other non-automatic variables, but the data step always does this with variables in the PDV:

  • variables from input datasets are retained (so when one observation of dataset A is merged with several observations of B, the values of A persist)
  • newly created variables are always set to missing at the start of a new datastep iteration, unless they are named in a retain statement or a summation statement of the form 
    x + n;
    (x will be retained, n is any numeric expression)

HTH

mkeintz
PROC Star

@Kurt_Bremser

 

Retained variables are indeed part of the pdv, but that does not mean they have the same memory address as when they are not retained.  This is what I meant by "section" of memory.

 

For simplicity let me restrict my example to numeric variables that are newly created in the data step.

 

Consider the impact on the address of variable W below.  W is retained in the second data step but not the first, and as a result has a different memory address.   In fact, if you have a number of new variables, and retain a subset, I have never seen a retained variable in a memory location contiguous to the non-retained vars.  Instead they are contiguous to each other (separated by 8 bytes needed for numeric variables).   And the non-retained vars are similarly contiguous to each other. 

 

This is why I believe it is a useful paradigm to consider the retain statement as a memory-location assignment statement.

 

Even so, as you have noted, they are in the PDV, and non-retained and retained variables can be logically contiguous - handy for programming logic statement, such as array declarations, etc.

 

 

 

 

data _null_;
   set sashelp.class;
   a=age;
   w=weight;
   h=height;

   ada=addrlong(a);
   adw=addrlong(w);
   adh=addrlong(h);
   put (ad:) (=$hex16. /);
   stop;
run;

data _null_;
   set sashelp.class;
   a=age;
   w=weight;
   h=height;
   retain w;
   ada=addrlong(a);
   adw=addrlong(w);
   adh=addrlong(h);
   put (ad:) (=$hex16. /);
   stop;
run;

 

 

My system is windows, which has "little endian" addresses, so addresses such as 

   6840750600000000

   7040750600000000

are "contiguous"  (ie

     location 6840750600000000  is followed by

     location 6940750600000000  is followed by

     location 6A40750600000000  is followed by

     location 6B40750600000000  is followed by

     location 6C40750600000000  is followed by

     location 6D40750600000000  is followed by

     location 6E40750600000000  is followed by

     location 6F40750600000000  is followed by

     location 7040750600000000

providing 8 bytes for the numeric variable at the first address.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

The retain statement can precede the corresponding value assignment statement (although the retain statement has the option of assigning in initial value.  Note because of this, you can declare a retained variable even though not only the value, but also the variable type (numeric vs character) is not evident until a subsequent statement.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jc3992
Pyrite | Level 9

Thank you. A bit complicated but I think I will figure it out after I am more familiar to it. Thanks!

mkeintz
PROC Star

Think of it this way.  When the SAS data step encounters a retain statement it turns out that the retained variables are stored in a different part of memory than for non-retained variables.  This can be demonstrated by use of the ADDRLONG function, which I don't propose to describe here.

 

So in a way, all the retain statement apparently does is assign a variable's location in memory to a region which the data step does not reset to missing with each new record.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1129 views
  • 7 likes
  • 3 in conversation