BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello
I have 3 questions please-

 

question 1-
Why if I delete  sentences pop7=0; failure=0;
then calculated varaibles POP7 and FAILURE be null in first row?


question 2-
Why sentences   pop7=0;failure=0;
are located before do until?
 
question 3-
Why this code create one row for each customer?
I didnt use end=last in set statement and didnt write If last=1
so I dont understand why I have one row for each customer?
 

data have;
format ddate date9.;
input CustID ddate :date9.  pop Ind_Fail;
cards;
1 '01APR2022'd 7 0
1 '01MAY2022'd 7 0
1 '01JUN2022'd 7 0
1 '01JUL2022'd 7 0
1 '01AUG2022'd 7 0
1 '01SEP2022'd 7 0
1 '01OCT2022'd 7 0
1 '01NOV2022'd 8 0
2 '01APR2022'd 7 0
2 '01MAY2022'd 7 0
2 '01JUN2022'd 7 1
2 '01JUL2022'd 7 0
2 '01AUG2022'd 7 0
2 '01SEP2022'd 7 0
2 '01OCT2022'd 7 0
2 '01NOV2022'd 7 1
;
Run;


data want;
pop7=0;
failure=0;
do until(last.CustID);
set have;
by CustID;
if (ddate='01NOV2022'd and pop=7) then POP7=1;
if ddate in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd) and Ind_Fail=1 then FAILURE=1;
end;
run;
8 REPLIES 8
ballardw
Super User

question 1-
Why if I delete  sentences pop7=0; failure=0;
then calculated varaibles POP7 and FAILURE be null in first row?

 

Likely because the conditional assignments of 1 to them are not true and no value assigned

 

question 2-
Why sentences   pop7=0;failure=0;
are located before do until?

 

So they don't get reset to zero constantly by the do until.

Otherwise any of the 1 values that had been assigned would be reset to 0 before the next record was read with the set.

 

question 3-
Why this code create one row for each customer?

There is no OUTPUT inside this loop:

do until(last.CustID);
set have;
by CustID;
if (ddate='01NOV2022'd and pop=7) then POP7=1;
if ddate in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd) and Ind_Fail=1 then FAILURE=1;
end;

so it keeps executing and assigning until the LAST.CUSTID is true, or the last record of each customer is used.

Then the values keep through the loop are the ones written to the output data set.

 

Ronein
Meteorite | Level 14
Thanks.
Regarding question 3- let's talk about customer 1.
There are 8 rows for customer 1.
Are there 8 iterations for customer 1?
As I understand in each iteration there is set of one row and the next row delete the previous row and so on so at the end only the last row left? Is it true? How can I understand and see the process? Is it possible to see what sas do in each step ?( I mean to see the data in eqxh step)? Regarding question 1- why is the purpose of initiate values 0 for 2 variables? Is outcome be different if we don't do it?
ballardw
Super User

@Ronein wrote:
Thanks.
Regarding question 3- let's talk about customer 1.
There are 8 rows for customer 1.
Are there 8 iterations for customer 1?
As I understand in each iteration there is set of one row and the next row delete the previous row and so on so at the end only the last row left? Is it true? How can I understand and see the process? Is it possible to see what sas do in each step ?( I mean to see the data in eqxh step)? Regarding question 1- why is the purpose of initiate values 0 for 2 variables? Is outcome be different if we don't do it?

Show me where you think anything is deleted. Really, show me.

SAS does have a DELETE statement to remove observations but there is not one in this code. So I don't know why you think anything is deleted. Values for the two variables can only change in two places, one for each variable and there isn't anything removed.

 

This is a very simple example of what is going on:

data example;
   x=3;
   do y=1 to 5;
      put 'at top    ' x= y=;
      if y=3 then x=1;
      put 'at bottom ' x= y=;

   end;
run;

Change the x to missing before the DO loop and see the values change. Nothing is deleted  but is kept.

 

Regarding question 1- why is the purpose of initiate values 0 for 2 variables? Is outcome be different if we don't do it?

Didn't you already run code that way and get missing values for at least one customer? Isn't that why you asked the question. Yes you get different results. Your requirement was to assign 0 or 1. There has to be some place to set it. Since SAS executes from top to bottom the order matters. Try different places with sets you know the expected result. See what happens.

 

This particular code keeps values which is why the timing of the initialization to zero of the variables is important.

 

 

How can I understand and see the process? Is it possible to see what sas do in each step ?

 

Look in your online help for how to start and use the SAS DATA Step Debugger. This let's you step through code seeing the result of the variables you specify. Search for that specific phrase "Data step debugger".

 

Suggestion: Do not use a big data set that has a small number of the cases you are interested in. Make a small set, maybe 3 or 4 customers with no more than 5 observations each. Make sure some get into the assigning 1's and others don't. Otherwise you are likely to not catch the changes because otherwise nothing much changes with sort of code.

 

This sort of code even has a name. Look at https://support.sas.com/resources/papers/proceedings12/052-2012.pdf for just one set of uses.

 

 

Tom
Super User Tom
Super User

The data step iterates until something stops it.  The normal way it stops is when it reads past the end of the input.  But other things can stop a data step.  The code could execute a STOP statement. SAS could detect an error that forces it to stop the data set.

 

The SET statement (at least a simple one like in this code) reads one observation each time it executes.

 

So placing the SET inside the DO loop means that on each iteration of the data step the SET statement executers once for every observation with that value of CUSTID.  Nothing is deleted, but when an observation is read from the dataset then the values of the variables change to reflect what was read.  Just like the value or variable changes when you execute an assignment statement.

 

To see what it going on add some PUT statements and re-run the code.

 

Spoiler
data want;
put '(1)  ' (_n_ custid ddate ind_fail pop pop7 failure) (=);
  pop7=0;
  failure=0;
put '(2)  ' (_n_ custid ddate ind_fail pop pop7 failure) (=);
  do until(last.CustID);
    set have;
    by CustID;
put '(3)  ' (_n_ custid ddate ind_fail pop pop7 failure last.custid) (=);
    if (ddate='01NOV2022'd and pop=7) then POP7=1;
    if ddate in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd) and Ind_Fail=1 then FAILURE=1;
put '(4)  ' (_n_ custid ddate ind_fail pop pop7 failure) (=);
  end;
put '(5)  ' (_n_ custid ddate ind_fail pop pop7 failure) (=);
run;
(1)  _N_=1 custid=. ddate=. ind_fail=. pop=. pop7=. failure=.
(2)  _N_=1 custid=. ddate=. ind_fail=. pop=. pop7=0 failure=0
(3)  _N_=1 custid=1 ddate=01APR2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=1 custid=1 ddate=01APR2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=1 custid=1 ddate=01MAY2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=1 custid=1 ddate=01MAY2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=1 custid=1 ddate=01JUN2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=1 custid=1 ddate=01JUN2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=1 custid=1 ddate=01JUL2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=1 custid=1 ddate=01JUL2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=1 custid=1 ddate=01AUG2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=1 custid=1 ddate=01AUG2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=1 custid=1 ddate=01SEP2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=1 custid=1 ddate=01SEP2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=1 custid=1 ddate=01OCT2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=1 custid=1 ddate=01OCT2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=1 custid=1 ddate=01NOV2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=1
(4)  _N_=1 custid=1 ddate=01NOV2022 ind_fail=0 pop=7 pop7=1 failure=0
(5)  _N_=1 custid=1 ddate=01NOV2022 ind_fail=0 pop=7 pop7=1 failure=0
(1)  _N_=2 custid=1 ddate=01NOV2022 ind_fail=0 pop=7 pop7=. failure=.
(2)  _N_=2 custid=1 ddate=01NOV2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=2 custid=2 ddate=01APR2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=2 custid=2 ddate=01APR2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=2 custid=2 ddate=01MAY2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=2 custid=2 ddate=01MAY2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=2 custid=2 ddate=01JUN2022 ind_fail=1 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=2 custid=2 ddate=01JUN2022 ind_fail=1 pop=7 pop7=0 failure=1
(3)  _N_=2 custid=2 ddate=01JUL2022 ind_fail=0 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=2 custid=2 ddate=01JUL2022 ind_fail=0 pop=7 pop7=0 failure=1
(3)  _N_=2 custid=2 ddate=01AUG2022 ind_fail=0 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=2 custid=2 ddate=01AUG2022 ind_fail=0 pop=7 pop7=0 failure=1
(3)  _N_=2 custid=2 ddate=01SEP2022 ind_fail=0 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=2 custid=2 ddate=01SEP2022 ind_fail=0 pop=7 pop7=0 failure=1
(3)  _N_=2 custid=2 ddate=01OCT2022 ind_fail=0 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=2 custid=2 ddate=01OCT2022 ind_fail=0 pop=7 pop7=0 failure=1
(3)  _N_=2 custid=2 ddate=01NOV2022 ind_fail=1 pop=7 pop7=0 failure=1 last.CustID=1
(4)  _N_=2 custid=2 ddate=01NOV2022 ind_fail=1 pop=7 pop7=1 failure=1
(5)  _N_=2 custid=2 ddate=01NOV2022 ind_fail=1 pop=7 pop7=1 failure=1
(1)  _N_=3 custid=2 ddate=01NOV2022 ind_fail=1 pop=7 pop7=. failure=.
(2)  _N_=3 custid=2 ddate=01NOV2022 ind_fail=1 pop=7 pop7=0 failure=0
(3)  _N_=3 custid=3 ddate=01APR2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=3 custid=3 ddate=01APR2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=3 custid=3 ddate=01MAY2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=3 custid=3 ddate=01MAY2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=3 custid=3 ddate=01JUN2022 ind_fail=1 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=3 custid=3 ddate=01JUN2022 ind_fail=1 pop=7 pop7=0 failure=1
(3)  _N_=3 custid=3 ddate=01JUL2022 ind_fail=1 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=3 custid=3 ddate=01JUL2022 ind_fail=1 pop=7 pop7=0 failure=1
(3)  _N_=3 custid=3 ddate=01AUG2022 ind_fail=1 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=3 custid=3 ddate=01AUG2022 ind_fail=1 pop=7 pop7=0 failure=1
(3)  _N_=3 custid=3 ddate=01SEP2022 ind_fail=1 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=3 custid=3 ddate=01SEP2022 ind_fail=1 pop=7 pop7=0 failure=1
(3)  _N_=3 custid=3 ddate=01OCT2022 ind_fail=0 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=3 custid=3 ddate=01OCT2022 ind_fail=0 pop=7 pop7=0 failure=1
(3)  _N_=3 custid=3 ddate=01NOV2022 ind_fail=0 pop=7 pop7=0 failure=1 last.CustID=1
(4)  _N_=3 custid=3 ddate=01NOV2022 ind_fail=0 pop=7 pop7=1 failure=1
(5)  _N_=3 custid=3 ddate=01NOV2022 ind_fail=0 pop=7 pop7=1 failure=1
(1)  _N_=4 custid=3 ddate=01NOV2022 ind_fail=0 pop=7 pop7=. failure=.
(2)  _N_=4 custid=3 ddate=01NOV2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=4 custid=4 ddate=01APR2022 ind_fail=0 pop=8 pop7=0 failure=0 last.CustID=0
(4)  _N_=4 custid=4 ddate=01APR2022 ind_fail=0 pop=8 pop7=0 failure=0
(3)  _N_=4 custid=4 ddate=01MAY2022 ind_fail=0 pop=8 pop7=0 failure=0 last.CustID=0
(4)  _N_=4 custid=4 ddate=01MAY2022 ind_fail=0 pop=8 pop7=0 failure=0
(3)  _N_=4 custid=4 ddate=01JUN2022 ind_fail=1 pop=8 pop7=0 failure=0 last.CustID=0
(4)  _N_=4 custid=4 ddate=01JUN2022 ind_fail=1 pop=8 pop7=0 failure=1
(3)  _N_=4 custid=4 ddate=01JUL2022 ind_fail=0 pop=8 pop7=0 failure=1 last.CustID=0
(4)  _N_=4 custid=4 ddate=01JUL2022 ind_fail=0 pop=8 pop7=0 failure=1
(3)  _N_=4 custid=4 ddate=01AUG2022 ind_fail=0 pop=8 pop7=0 failure=1 last.CustID=0
(4)  _N_=4 custid=4 ddate=01AUG2022 ind_fail=0 pop=8 pop7=0 failure=1
(3)  _N_=4 custid=4 ddate=01SEP2022 ind_fail=0 pop=8 pop7=0 failure=1 last.CustID=0
(4)  _N_=4 custid=4 ddate=01SEP2022 ind_fail=0 pop=8 pop7=0 failure=1
(3)  _N_=4 custid=4 ddate=01OCT2022 ind_fail=0 pop=8 pop7=0 failure=1 last.CustID=0
(4)  _N_=4 custid=4 ddate=01OCT2022 ind_fail=0 pop=8 pop7=0 failure=1
(3)  _N_=4 custid=4 ddate=01NOV2022 ind_fail=0 pop=8 pop7=0 failure=1 last.CustID=1
(4)  _N_=4 custid=4 ddate=01NOV2022 ind_fail=0 pop=8 pop7=0 failure=1
(5)  _N_=4 custid=4 ddate=01NOV2022 ind_fail=0 pop=8 pop7=0 failure=1
(1)  _N_=5 custid=4 ddate=01NOV2022 ind_fail=0 pop=8 pop7=. failure=.
(2)  _N_=5 custid=4 ddate=01NOV2022 ind_fail=0 pop=8 pop7=0 failure=0
(3)  _N_=5 custid=5 ddate=01APR2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=5 custid=5 ddate=01APR2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=5 custid=5 ddate=01MAY2022 ind_fail=0 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=5 custid=5 ddate=01MAY2022 ind_fail=0 pop=7 pop7=0 failure=0
(3)  _N_=5 custid=5 ddate=01JUN2022 ind_fail=1 pop=7 pop7=0 failure=0 last.CustID=0
(4)  _N_=5 custid=5 ddate=01JUN2022 ind_fail=1 pop=7 pop7=0 failure=1
(3)  _N_=5 custid=5 ddate=01JUL2022 ind_fail=0 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=5 custid=5 ddate=01JUL2022 ind_fail=0 pop=7 pop7=0 failure=1
(3)  _N_=5 custid=5 ddate=01AUG2022 ind_fail=0 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=5 custid=5 ddate=01AUG2022 ind_fail=0 pop=7 pop7=0 failure=1
(3)  _N_=5 custid=5 ddate=01SEP2022 ind_fail=0 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=5 custid=5 ddate=01SEP2022 ind_fail=0 pop=7 pop7=0 failure=1
(3)  _N_=5 custid=5 ddate=01OCT2022 ind_fail=0 pop=7 pop7=0 failure=1 last.CustID=0
(4)  _N_=5 custid=5 ddate=01OCT2022 ind_fail=0 pop=7 pop7=0 failure=1
(3)  _N_=5 custid=5 ddate=01NOV2022 ind_fail=0 pop=8 pop7=0 failure=1 last.CustID=1
(4)  _N_=5 custid=5 ddate=01NOV2022 ind_fail=0 pop=8 pop7=0 failure=1
(5)  _N_=5 custid=5 ddate=01NOV2022 ind_fail=0 pop=8 pop7=0 failure=1
(1)  _N_=6 custid=5 ddate=01NOV2022 ind_fail=0 pop=8 pop7=. failure=.
(2)  _N_=6 custid=5 ddate=01NOV2022 ind_fail=0 pop=8 pop7=0 failure=0
NOTE: There were 40 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 5 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

 

Quentin
Super User

@Ronein wrote:
Thanks.
Regarding question 3- let's talk about customer 1.
There are 8 rows for customer 1.
Are there 8 iterations for customer 1?

Yes, the DO UNTIL loop will iterate 8 times for customer 1.

 

As I understand in each iteration there is set of one row and the next row delete the previous row and so on so at the end only the last row left? Is it true?

This is close to correct.  The DATA step uses a program data vector (PDV) to do its work.  You can think of the PDV as a single row (vector), which the DATA step uses to hold data it is working on.  Every DATA step has one PDV, and it is only one row, and it has all the variables that exist in the DATA step.   When the SET statement executes, it reads one record from the dataset into the PDV.  Then next time the SET statement executes, it reads the next record from the dataset into the PDV.  Nothing is deleted from the PDV.  But each time data is read into the PDV, it overwrites any data that was there before.

 

How can I understand and see the process? Is it possible to see what sas do in each step ?( I mean to see the data in eqxh step)?

You can use PUT statements to write the values in the PDV to the log.  This lets you see the values of variables in the PDV at any point.  For example, you could update your code to be:

 

 

data want;
  pop7=0;
  failure=0;
  do until(last.CustID);
    set have;
    by CustID;
    if (ddate='01NOV2022'd and pop=7) then POP7=1;
    if ddate in ('01OCT2022'd,'01SEP2022'd ,'01AUG2022'd,'01JUL2022'd,'01JUN2022'd) and Ind_Fail=1 then FAILURE=1;

    put "Bottom of DO until loop " (_N_ CustID ddate pop Ind_Fail pop7 failure last.CustID)(=) ;
  end;

  
  put / "Bottom of DATA step loop. This record will be output: " /  (_N_ CustID ddate pop Ind_Fail pop7 failure last.CustID)(=) / ;
run;

 

 

Regarding question 1- why is the purpose of initiate values 0 for 2 variables? Is outcome be different if we don't do it?

Yes, the outcome will be different.  Try commenting out statements, then run the code, and look at values the PUT statement writes to the log.

 

For more reading, I would recommend: https://support.sas.com/resources/papers/proceedings09/038-2009.pdf .

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Ronein
Meteorite | Level 14
Customer1 for example has 8 rows.
As I understand sas read all of these 8 rows.
If sas read all these 8 rows and finally there is only one row ( of last observation) then what process cause it?
ballardw
Super User

@Ronein wrote:
Customer1 for example has 8 rows.
As I understand sas read all of these 8 rows.
If sas read all these 8 rows and finally there is only one row ( of last observation) then what process cause it?

Cause what?

Astounding
PROC Star
The posters you have been working with all understand the process perfectly. My only contribution here will be to simplify the process to highlight answers to some of the questions you ask. Study the results and see what makes sense.
data have;
do cust =1 to 5;
do obs =1 to 8;
n + 1;
output;
end;
end;
run;
That gives you a simple data set you can play with; So here is one way to play.
data want;
do until (last.cust);
put _all_ "before";
set have;
by cust;
put _all_ "after";
end;
run;
Hope this helps rather than confuses.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 660 views
  • 0 likes
  • 5 in conversation