The example is from page 2 of this paper.
http://analytics.ncsu.edu/sesug/2010/BB13.Dorfman.pdf
"If VAR is missing CONTINUE passes the control straight to the bottom of the loop".
What is the meaning of bottom of the loop? Is it not gona calculate Mcount Prod and SUm for that variable and goes to the next variable but still with in the loop?Is that the meaning ?
Secondly,
"PROD and COUNT are
set to 1, and the non-retained SUM, MEAN, and MCOUNT are set to missing by the default action of the
implied loop (program control at the top of the implied loop)."
This sentence says that SUM Mean and Count are set to missing . But what lines in the code sets these values to missing??
Any help is greatly appreciated possibly with an example would be great
Thanks
Data B ( Keep = Id Prod Sum Count Mean) ;
Prod = 1 ;
Do Count = 1 By 1 Until ( Last.Id ) ;
Set A ;
By Id ;
If Missing (Var) Then Continue ;
Mcount = Sum (Mcount, 1) ;
Prod = Prod * Var ;
Sum = Sum (Sum, Var) ;
End ;
Mean = Sum / Mcount ;
Run ;
Message was edited by: Karun Diri Also I feel the keep statement would have been good with the set statement rather than the Data B Any suggestions??
MyBad! I had two step number 4s. I meant for you to run the following in one run .. not try to run it in parts:
data a;
input id var;
cards;
1 10
1 .
1 20
2 .
2 20
2 30
;
Data B ( Keep = Id Prod Sum Count Mean) ;
Prod = 1 ;
Do Count = 1 By 1 Until ( Last.Id ) ;
Set A ;
By Id ;
putlog "(1)" _all_;
If Missing (Var) Then Continue ;
putlog "(2)" _all_;
Mcount = Sum (Mcount, 1) ;
Prod = Prod * Var ;
putlog "(3)" _all_;
Sum = Sum (Sum, Var) ;
putlog "(4)" _all_;
End ;
Mean = Sum / Mcount ;
putlog "(5)" _all_;
Run ;
You should end up reading all six records, but only reaching step 5 twice and only outputting two records.
However, I agree with Rick, that this might be too complex for you at this point. If you can answer your own questions after running the above code (that is to say, as is, not in parts), and answer your own questions by looking at what the log shows, you will at least learned a valuable lesson.
Karun,
I think the best way to learn what is going on is to attempt the solution with some data, add some putlog statements in the code so that you can see what is happening within the PDV, and then attempt to answer your own questions.
E.g., try the following and let us know if you can answer your own questions:
data a;
input id var;
cards;
1 10
1 .
1 20
2 .
2 20
2 30
;
Data B ( Keep = Id Prod Sum Count Mean) ;
Prod = 1 ;
Do Count = 1 By 1 Until ( Last.Id ) ;
Set A ;
By Id ;
putlog "(1)" _all_;
If Missing (Var) Then Continue ;
putlog "(2)" _all_;
Mcount = Sum (Mcount, 1) ;
Prod = Prod * Var ;
putlog "(3)" _all_;
Sum = Sum (Sum, Var) ;
putlog "(4)" _all_;
End ;
Mean = Sum / Mcount ;
putlog "(5)" _all_;
Run ;
Sorry about that.
I figured it out.
I first ran it until putlog "(1)" _all_;
WARNING: The variable Sum in the DROP, KEEP, or RENAME list has never been referenced.
WARNING: The variable Mean in the DROP, KEEP, or RENAME list has never been referenced.
(1)Prod=1 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 _ERROR_=0 _N_=1
(1)Prod=1 Count=2 Last.Id=0 id=1 var=. FIRST.id=0 _ERROR_=0 _N_=1
(1)Prod=1 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 _ERROR_=0 _N_=1
(1)Prod=1 Count=1 Last.Id=0 id=2 var=. FIRST.id=1 _ERROR_=0 _N_=2
(1)Prod=1 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 _ERROR_=0 _N_=2
(1)Prod=1 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 _ERROR_=0 _N_=2
Compared that output with running until putlog "(2)" _all_;
WARNING: The variable Sum in the DROP, KEEP, or RENAME list has never been referenced.
WARNING: The variable Mean in the DROP, KEEP, or RENAME list has never been referenced.
(2)Prod=1 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 _ERROR_=0 _N_=1
(2)Prod=1 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 _ERROR_=0 _N_=1
(2)Prod=1 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 _ERROR_=0 _N_=2
(2)Prod=1 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 _ERROR_=0 _N_=2
Putlog(2) output will not have var=. because continue passes to the bottom of the loop directly if the var=.
So of the 6 obs we now have only 4 observation
IS that right?
Thanks
MyBad! I had two step number 4s. I meant for you to run the following in one run .. not try to run it in parts:
data a;
input id var;
cards;
1 10
1 .
1 20
2 .
2 20
2 30
;
Data B ( Keep = Id Prod Sum Count Mean) ;
Prod = 1 ;
Do Count = 1 By 1 Until ( Last.Id ) ;
Set A ;
By Id ;
putlog "(1)" _all_;
If Missing (Var) Then Continue ;
putlog "(2)" _all_;
Mcount = Sum (Mcount, 1) ;
Prod = Prod * Var ;
putlog "(3)" _all_;
Sum = Sum (Sum, Var) ;
putlog "(4)" _all_;
End ;
Mean = Sum / Mcount ;
putlog "(5)" _all_;
Run ;
You should end up reading all six records, but only reaching step 5 twice and only outputting two records.
However, I agree with Rick, that this might be too complex for you at this point. If you can answer your own questions after running the above code (that is to say, as is, not in parts), and answer your own questions by looking at what the log shows, you will at least learned a valuable lesson.
Thats great . I was running in parts just to see what happens at each step.
I already noted the difference between step1 and step2.
I agree with you both on this stuff looking complex to me. But i dont know for good or bad I have the attitude of going into the very depths even though its not required for me right now.
I see that most of the people post Answers on Dow loop. So I wanted to learn this stuff.
Thanks for making me walk through the basic concepts.....
Cheers
Good! When you are ready, post the answers to your original questions. BTW, I find that taking advantage of the putlog _all_ statement is an excellent way to gain an understanding of how the datastep functions .. especially when one adds the complexities of a dow-type loop to the process.
Yes I will. Thanks once again
Thanks a lot . I will frame the answer but it will be based on your example........and i will explain the put log with what is happening in every step. Will post it later today.
Cheers
*The following is the log for the dow loop by using putlog in the code and knowing what happens at each stage.
Arthurs example dataset above is used for analysis
Please let me know if I went wrong
I am worried abt the explanation I gave for Putlog(3).......PLease correct
data a;
input id var;
cards;
1 10
1 .
1 20
2 .
2 20
2 30
;
Putlog 1
*For the datastep the PDV is set up at the time of compilation with all the information for the variables as well as the automatic variables(_n_ and _error_). Here missing variables are also included.
20 Data B ( Keep = Id Prod Sum Count Mean) ;
21
22 Prod = 1 ;
23
24 Do Count = 1 By 1 Until ( Last.Id ) ;
25
26 Set A ;
27
28 By Id ;
29
30 putlog "(1)" _all_;
31 END;
32 RUN;
(1)Prod=1 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 _ERROR_=0 _N_=1
(1)Prod=1 Count=2 Last.Id=0 id=1 var=. FIRST.id=0 _ERROR_=0 _N_=1
(1)Prod=1 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 _ERROR_=0 _N_=1
(1)Prod=1 Count=1 Last.Id=0 id=2 var=. FIRST.id=1 _ERROR_=0 _N_=2
(1)Prod=1 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 _ERROR_=0 _N_=2
(1)Prod=1 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 _ERROR_=0 _N_=2
Putlog 2
In this part of the code below since the continue is used if var is missing the contol transfrs the loop to the end statement.
That is why in this stage we have all the information for only the variables with non missing var;
Data B ( Keep = Id Prod Sum Count Mean) ;
34
35 Prod = 1 ;
36
37 Do Count = 1 By 1 Until ( Last.Id ) ;
38
39 Set A ;
40
41 By Id ;
42
43
44
45
If Missing (Var) Then Continue ;
47
48 putlog "(2)" _all_;
49 END;
50 RUN;
(2)Prod=1 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 _ERROR_=0 _N_=1
(2)Prod=1 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 _ERROR_=0 _N_=1
(2)Prod=1 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 _ERROR_=0 _N_=2
(2)Prod=1 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 _ERROR_=0 _N_=2
Putlog 3
When we use a by group with in the DOW loop SAS keeps track of the DATA within each BY group.
Meaning prod value in this step below is retained in the group now when it has to have a value of 1 (we assigned at the beginning and is supposed to retain usually since this variable is being created in this datastep and not coming from dataset A) each time...!!!!!!!!!
51 Data B ( Keep = Id Prod Sum Count Mean) ;
52
53 Prod = 1 ;
54
55 Do Count = 1 By 1 Until ( Last.Id ) ;
56
57 Set A ;
58
59 By Id ;
60
61 If Missing (Var) Then Continue ;
62 Mcount = Sum (Mcount, 1) ;
63
64 Prod = Prod * Var ;
65
66 putlog "(3)" _all_;
67 END;
68 RUN;
(3)Prod=10 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 Mcount=1 _ERROR_=0 _N_=1
(3)Prod=200 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 Mcount=2 _ERROR_=0 _N_=1
(3)Prod=20 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 Mcount=1 _ERROR_=0 _N_=2
(3)Prod=600 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 Mcount=2 _ERROR_=0 _N_=2
Putlog 4
Now in each group the sum value has the cumulative of var because of the stmnt
sum=sum(sum,var);
69 Data B ( Keep = Id Prod Sum Count Mean) ;
70
71 Prod = 1 ;
72
73 Do Count = 1 By 1 Until ( Last.Id ) ;
74
75 Set A ;
76
77 By Id ;
78
79 If Missing (Var) Then Continue ;
80 Mcount = Sum (Mcount, 1) ;
81
82 Prod = Prod * Var ;
83 Sum = Sum (Sum, Var) ;
84
85 putlog "(4)" _all_;
86 END;
87 RUN;
(4)Prod=10 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 Mcount=1 Sum=10 _ERROR_=0 _N_=1
(4)Prod=200 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 Mcount=2 Sum=30 _ERROR_=0 _N_=1
(4)Prod=20 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 Mcount=1 Sum=20 _ERROR_=0 _N_=2
(4)Prod=600 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 Mcount=2 Sum=50 _ERROR_=0 _N_=2
Putlog 5
In the step below 30 is the final sum for ID 1 group and 50 is trhe sum for ID 2 group;
since our data is divided into 2 groups by the BY ID statement
At this point we went Outside the loop and said Mean = Sum / Mcount ;
Final sum's for each BY variable is divided by the final Mcount and finallly end up with the 2 observations having value of
Mean=15 for group ID1
Mean=25 for group ID2
88 Data B ( Keep = Id Prod Sum Count Mean) ;
89
90 Prod = 1 ;
91
92 Do Count = 1 By 1 Until ( Last.Id ) ;
93
94 Set A ;
95
96 By Id ;
97
98 If Missing (Var) Then Continue ;
99 Mcount = Sum (Mcount, 1) ;
100
101 Prod = Prod * Var ;
102 Sum = Sum (Sum, Var) ;
103 END;
104
105 /*notice that the mean stmnt is after the END stmnt*/
106
107 Mean = Sum / Mcount ;
108
109 putlog "(5)" _all_;
110
111 Run ;
(5)Prod=200 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 Mcount=2 Sum=30 Mean=15 _ERROR_=0 _N_=1
(5)Prod=600 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 Mcount=2 Sum=50 Mean=25 _ERROR_=0 _N_=2
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.