I'm studying towards the SAS advanced programmer certification and I'm trying to make sense of this piece of code in the preparation guide (p. 545):
data work.lookup1;
array Targets{1997:1999,12} _temporary_;
if _n_=1 then do i= 1 to 3;
set sasuser.ctargets;
array Mnth{*} Jan--Dec;
do j=1 to dim(mnth);
targets{year,j}=mnth{j};
end;
I understand the first parts, we create a data set and an array with three rows and twelve columns. I will assume that the rows in the array works as "observations" somehow so that the if-then statement can execute. We then create a new array "in" the do-step, and now I would need some clarificatoin:
1) Does using an asterisk automatically copy "all" of the values in the data set or only all the values for a single row (so that all values for both dimensions would be something like (*,*)?)
2) The prep guide claims that the "elements" of the array need to be defined when using an asterisk, hence the "Jan--Dec"-part. But what constitutes an "element"? Why is only "month" included as an "element" and no other variables?
3) When the do-step executes for the second and third time, how does the code make sure we fetch the data for the correct year (so that we don't get three identical rows?)
Kind regards/Magnus
First the data set is created, with an "empty" array with three rows and twelve columns.
All data STEPS start with no variables defined until the compiler sees something in the code for the data step that will let it know what variables will be used. Once the whole step is compiled then the compiler knows what variables will be written to the target dataset.
All variables are EMPTY until you assign them some values.
But since we don't "have" any observaions yet. , _n_=1 thus can't refer to our array,
The automatic variable _N_ has nothing to do with either an array or dataset observations. It is just a count of the number of times the data step has iterated. Since you have DO loop around the SET statement in this data step the first iteration will read three observations from SASUSER.CTARGETS.
So for each observation that is read from sasuser.ctargets, we create a one-dimensional array containing the values for that specific row.
The array MNTHS consists of the variables JAN, FEB, ... DEC. When you read a new observation from CTARGETS the values of the VARIABLES are changed. So when I=2 the reference MNTHS[12] refers to the value of DEC read from the second observation in CTARGETS.
along with the value of year,
The value of YEAR is set when an observation is read from CTARGETS. (Assuming that CTARGETS actually does have that variable otherwise it will be missing and the array reference will fail with an invalid index.) It is not loading into the array. It used to index into the array to figure out what VARIABLE will get the value that is being assigned.
My printscreen function is malfunctoining
I am glad for are sake, please do not post photographs of your data. Copy the data as TEXT and use the Insert Code or Insert SAS Code icons on the tool bar to pop-up a window that you can paste the values into.
Remember in SAS an ARRAY does not actual contain data. It is just a way to define a method to reference multiple actual VARIABLES (which will contain the data) by using a single name plus one (or more) indexes into the array. Even the _TEMPORARY_ array defines actual variables, it just doesn't have to give them names since it doesn't need to write them to the output dataset and your code cannot access them directly.
1. No. The asterisk has no effect on the copying, it means the size of the array is not explicitly declared.
2. Because that's what the user wanted to load into the array values. Elements of arrays are the values stored within.
3. Year is a variable in the data set ctargets and note the array was created with YEAR as the third row. So if there are duplicate years it would overwrite the values, but that's how it knows what row to put the data in. Month controls what column the data goes into.
Note that this is not the 'very basics of an array'. This is a temporary array that is loaded on the fly and stays in memory and isn't a common usage of arrays in SAS.
@Syntas_error wrote:
I'm studying towards the SAS advanced programmer certification and I'm trying to make sense of this piece of code in the preparation guide (p. 545):
data work.lookup1;
array Targets{1997:1999,12} _temporary_;
if _n_=1 then do i= 1 to 3;
set sasuser.ctargets;
array Mnth{*} Jan--Dec;
do j=1 to dim(mnth);
targets{year,j}=mnth{j};
end;
I understand the first parts, we create a data set and an array with three rows and twelve columns. I will assume that the rows in the array works as "observations" somehow so that the if-then statement can execute. We then create a new array "in" the do-step, and now I would need some clarificatoin:
1) Does using an asterisk automatically copy "all" of the values in the data set or only all the values for a single row (so that all values for both dimensions would be something like (*,*)?)
2) The prep guide claims that the "elements" of the array need to be defined when using an asterisk, hence the "Jan--Dec"-part. But what constitutes an "element"? Why is only "month" included as an "element" and no other variables?
3) When the do-step executes for the second and third time, how does the code make sure we fetch the data for the correct year (so that we don't get three identical rows?)
Kind regards/Magnus
I'm sorry but this is still somewhat unclear:
1. No. The asterisk has no effect on the copying, it means the size of the array is not explicitly declared.
2. Because that's what the user wanted to load into the array values. Elements of arrays are the values stored within.
If this were correct, wouldn't only the names of the months be stored in the array (Jan--Dec)? I know from the book that the target "values" for each month are getting stored. Also, the book seems to make a distinction between values and elements?
"The ARRAY statement within the DO loop creates the Mnth array, which stores the values from Sasuser.Ctargets. The dimension of the Mnth array is specified using an asterisk, which enables SAS to automatically count the array elements.Note:If you use an asterisk to specify the dimensions of an array, you must list the array elements."
Since each specific "value" isn't being listed (e.g "192284420", "86376721"), this seems to indicate that values and element are indeed different concepts?
3. Year is a variable in the data set ctargets and note the array was created with YEAR as the third row. So if there are duplicate years it would overwrite the values, but that's how it knows what row to put the data in. Month controls what column the data goes into.
lt seems like I'm missing something essential, Where does the code specify the third row to equal "year"? As far as I can see we have two arrays, Targets with two unnamed dimensions and Mnth, the properties of which is uncertain.
I think this line is the most confusing:
targets{year,j}=mnth{j};
This line seems to indicate that for each unique month of each unique year, the value of targets is set to the value of months. I can't quite grasp "how" this is done however. There is no variable named "year" in targets. Does SAS automatically interpret four-digit numbers as belonging to a "year"-variable? Or is the line telling the targets array to load the value of "year" (which have somehow implicitly been added) from the mnth-array?
It seems like I'm missing something fundamental, hence my reference to "the very basics" of working with arrays.
Kind regards/Magnus
Arrays are only short cut references to variables in SAS, they are not objects on their own. I'm likely mixing elements and values. Elements are likely variable names and values would be the actual value. So an Element would be year, January, February, etc.
CTARGETS likely looks like this:
Year Jan Feb Mar .... Dec
1997 11 12 ..
1998 21 22 ..
1999 31 32 ..
array Targets{1997:1999,12} _temporary_;
This declares a TEMPORARY array named Target, that is 3 x 12, note the first dimension is indexed by 1997, 1998 and 1999 -> years.
_temporary_ means it doesn't create any variables to stay in the data set and will not exist after the data set. This also loads it into memory and it persists across rows unlike a typical array structure in SAS.
array Mnth{*} Jan--Dec;
This creates an array using the elements or variables from the ctargets data set which are the months. Jan to December. The -- notation means it takes all the columns between January to December which is 12 months. So to refer to March, you can now use mnth(3) because the third month is march.
do j=1 to dim(mnth);
targets{year,j}=mnth{j};
end;
Do loops over the months here.
So for the first row of ctargets, where i=1, year=1997
The explicit loop with J, loads each months data into the array.
do j=1 to dim(mnth);
The outer loop, I, is the number of rows of the data set and uses that to loop over the rows of the data set. The _n_=1 ensures this only happens on the first record, so you're loading the data set at the beginning and doing this only once.
if _n_=1 then do i= 1 to 3;
ctarget(1997, 1) = 11
ctarget(1997, 2) = 12
....
ctarget(1998, 2) = 32
This is now loaded into an array that you can access within the data step by identifying the year and month indexes.
Not sure if that's organized well, but hopefully that helps. If you want to try and trace it out in more detail, try adding some PUT statements so you can see the values at different time points. Untested of course:
data work.lookup1;
array Targets{1997:1999,12} _temporary_;
if _n_=1 then do i= 1 to 3;
set sasuser.ctargets;
array Mnth{*} Jan--Dec;
do j=1 to dim(mnth);
put 'I = ' i 'J=' j "Month=" mnth(j);
targets{year,j}=mnth{j};
end;
data work.lookup1;
Define a data step that will write to the dataset named LOOKUP1 in the WORK library.
array Targets{1997:1999,12} _temporary_;
Define an array (of variables) that uses 2 dimension to index. The first index uses index values from 1997 to 1999 (so 3 "rows"). The second index uses index values from 1 to 12 (so 12 "columns") . The actual variables to use a temporary variables.
if _n_=1 then do i= 1 to 3;
On the first iteration of the data step start a DO loop that uses iteration variable I going from 1 to 3. So run the block of code up the matching END (which you didin't inlcude) three times.
set sasuser.ctargets;
Read data from the dataset CTARGETS in the library SASUSER.
array Mnth{*} Jan--Dec;
Define an array named MNTH that uses the variables in the range JAN to DEC using their position in the data set being generated. The asterisk means that the SAS compiler will figure out home any variables are in this array based on how many names are listed. So this could be just two variables if DEC is the variable that comes immediately after JAN. I would assume that SASUSER.CTARGETS has the variable JAN FEB MAR .... DEC define in order without any other variables mixed in between.
do j=1 to dim(mnth);
Start another DO loop using J as the iteration variable. Go from one to the number of variables that MNtH array contains.
targets{year,j}=mnth{j};
take the value of Jth varaible in the MNTH array and assign it to the Jth column in the TARGETS array using the variable YEAR as to specify which row. It was important to define the limits for the first index when defining the TARGETS array so that 1999 means the third row instead of the 1,999th row.
end;
This ends the J loop.
end;
This one is missing in your post and it will end the I loop.
Okay, can I interpret the code like this?
data work.lookup1; array Targets{1997:1999,12} _temporary_; if _n_=1 then do i= 1 to 3; set sasuser.ctargets; array mnth{*} Jan--Dec; do j=1 to dim(mnth); targets{year,j}=mnth{j}; end;
First the data set is created, with an "empty" array with three rows and twelve columns.
Then a condition is set for when we execute the do-loop
if _n_=1 then do i= 1 to 3;
But since we don't "have" any observaions yet. , _n_=1 thus can't refer to our array, but must refer to the observations being read from sasuser.ctargets using the SET statement.
set sasuser.ctargets;
So for each observation that is read from sasuser.ctargets, we create a one-dimensional array containing the values for that specific row.
array mnth{*} Jan--Dec;
This row of values are then fed into the Targets-array, along with the value of year, which is not read from mnth, but from ctargets for that specific observation (though this seems quite superfluous).
do j=1 to dim(mnth); targets{year,j}=mnth{j}; end;
We then repeat the process for each observation. Since mnths is only reading one observation at a time, and since there are only one observation per year, there is no need for the mnths-array to reference the year-variable,
Note: My printscreen function is malfunctoining, but the ctargets data set looks something like this:'
obs year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 1997 192284420 86376721 28526103 260386468 109975326 102833104 196728648 236996122 112413744 125401565 72551855 136042505
2 1998 108645734 147656369 202158055 41160707 264294440 267135485 208694865 83456868 286846554 275721406 230488351 24901752
3 1999 85730444 74168740 39955768 312654811 318149340 187270927 123394421 34273985 151565752 141528519 178043261 181668256
First the data set is created, with an "empty" array with three rows and twelve columns.
All data STEPS start with no variables defined until the compiler sees something in the code for the data step that will let it know what variables will be used. Once the whole step is compiled then the compiler knows what variables will be written to the target dataset.
All variables are EMPTY until you assign them some values.
But since we don't "have" any observaions yet. , _n_=1 thus can't refer to our array,
The automatic variable _N_ has nothing to do with either an array or dataset observations. It is just a count of the number of times the data step has iterated. Since you have DO loop around the SET statement in this data step the first iteration will read three observations from SASUSER.CTARGETS.
So for each observation that is read from sasuser.ctargets, we create a one-dimensional array containing the values for that specific row.
The array MNTHS consists of the variables JAN, FEB, ... DEC. When you read a new observation from CTARGETS the values of the VARIABLES are changed. So when I=2 the reference MNTHS[12] refers to the value of DEC read from the second observation in CTARGETS.
along with the value of year,
The value of YEAR is set when an observation is read from CTARGETS. (Assuming that CTARGETS actually does have that variable otherwise it will be missing and the array reference will fail with an invalid index.) It is not loading into the array. It used to index into the array to figure out what VARIABLE will get the value that is being assigned.
My printscreen function is malfunctoining
I am glad for are sake, please do not post photographs of your data. Copy the data as TEXT and use the Insert Code or Insert SAS Code icons on the tool bar to pop-up a window that you can paste the values into.
Remember in SAS an ARRAY does not actual contain data. It is just a way to define a method to reference multiple actual VARIABLES (which will contain the data) by using a single name plus one (or more) indexes into the array. Even the _TEMPORARY_ array defines actual variables, it just doesn't have to give them names since it doesn't need to write them to the output dataset and your code cannot access them directly.
The automatic variable _N_ has nothing to do with either an array or dataset observations. It is just a count of the number of times the data step has iterated. Since you have DO loop around the SET statement in this data step the first iteration will read three observations from SASUSER.CTARGETS.
Okay, but what exacly "constitutes" the first interation of the data step, if it's not the reading of the first observation in ctargets via the SET statement?
In your data step the first executable statement is the IF statement. The SET statement is also an executable statement. So how many observations are read by that SET statement is not what _N_ is counting. You could execute the SET statement multiple times in the a single iteration (or no times).
Since you only posted part of a data step it is not clear what the whole step was but it was probably something like this:
data want;
array lookups [1000,12] _temporary_;
if _n_=1 then do row=1 to nobs;
set lookups nobs=nobs;
array onerow jan -- dec;
do col=1 to dim(onerow);
lookups[row,col] = onerow[col];
end;
end;
set realdata;
newvar=lookups[row_index,col_index];
run;
So on the first iteration the data step will populate the LOOKUP array. Then proceed to read the first observation of the real data , use the lookup array for something. At the end there will be an implied output to write an observation. One the second iteration it will skip reading the lookup data and so straight to reading the next observation from REALDATA. So in THIS data step if REALDATA has 10 observations the data step will iterate 11 times. On the last iteration of the data step it will read past the end of REALDATA dataset and stop at that point never reaching the implied output statement. So the result is 11 iterations of the data step and 10 observations written.
But you could also write a data step like this.
data want;
do i=1 to 10;
output;
end;
run;
This data step will iterate only one time and write out 10 observations. When it gets to the end SAS will notice that it has not read any inputs and know that it does not need to iterate again to look for new input.
Okay, the entire code is as follows:
data work.lookup1; array Targets{1997:1999,12} _temporary_; if _n_=1 then do i= 1 to 3; set sasuser.ctargets; array mnth{*} Jan--Dec; do j=1 to dim(mnth); targets{year,j}=mnth{j}; end; end; set sasuser.monthsum(keep=salemon revcargo monthno); year=input(substr(salemon,4),4.); Ctarget=targets{year,monthno}; format ctarget dollar15.2; run;
Are you saying that the do-loop activates when SAS starts reading observations from the second data set (sasuser.monthsum)?
Are you saying that the do-loop activates when SAS starts reading observations from the second data set
No.
The flow of the execution of the statements in a data step is from the top to them bottom (unless you add some GOTO statements).
The DO loop starts when the IF condition evaluates to TRUE. Which will only happen on the first pass through the data step execution.
No records are read from the second dataset until the IF statement (and the DO loop inside of it when the condition is true) has finished running. Each time the statement:
set sasuser.monthsum(keep=salemon revcargo monthno);
executes it reads an observation from that dataset. Which will change the values of the three variables that are read from it.
My logic here is that the second SEt-statement is not part of the DO-loop and that it should execute independently, thus causing the first iteration of the data step and thus triggering the do-loop.
If this is not so than when exacly is the If-statement evaluated to be true?
The _N_ automatic variables is incremented each time a new iteration starts. It has nothing to do with SET or any other statements your data step includes.
Try these two programs:
data _null_;
put _n_=;
set sashelp.class (obs=3);
put name=;
run;
data _null_;
put _n_=;
do i=1 to 2;
put _n_= i=;
set sashelp.class (obs=3);
put name=;
end;
run;
2469 data _null_; 2470 put _n_=; 2471 set sashelp.class (obs=3); 2472 put name=; 2473 run; _N_=1 Name=Alfred _N_=2 Name=Alice _N_=3 Name=Barbara _N_=4 NOTE: There were 3 observations read from the data set SASHELP.CLASS. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds 2474 data _null_; 2475 put _n_=; 2476 do i=1 to 2; 2477 put _n_= i=; 2478 set sashelp.class (obs=3); 2479 put name=; 2480 end; 2481 run; _N_=1 _N_=1 i=1 Name=Alfred _N_=1 i=2 Name=Alice _N_=2 _N_=2 i=1 Name=Barbara _N_=2 i=2 NOTE: There were 3 observations read from the data set SASHELP.CLASS. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
I will try to repeat my question:
If the first DO-group won't execute until the IF-statement is true, and if no observations are read from sasuser.monthsum until the IF-statement evaluates to true, how can _n_ ever be evaluated to one? This seems like circular reasoning to me. Does the DATA-step iterate before any of these steps and in that case where?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.