DATA Step, Macro, Functions and more

Dow Loop-Basics

Accepted Solution Solved
Reply
Super Contributor
Posts: 1,040
Accepted Solution

Dow Loop-Basics

The example is from page 2 of this paper.

http://analytics.ncsu.edu/sesug/2010/BB13.Dorfman.pdf

"If VAR is missing CONTINUE passes the control straight to the bottom of the loop".

What is the meaning of bottom of the loop? Is it not gona calculate Mcount Prod and SUm for that variable and goes to the next variable but still with in the loop?Is that the meaning ?

Secondly,

"PROD and COUNT are

set to 1, and the non-retained SUM, MEAN, and MCOUNT are set to missing by the default action of the

implied loop (program control at the top of the implied loop)."

This sentence says that SUM Mean and Count are set to missing . But what lines in the code sets these values to missing??

Any help is greatly appreciated possibly with an example would be great

Thanks

Data B ( Keep = Id Prod Sum Count Mean) ;

  Prod = 1 ;

   Do Count = 1 By 1 Until ( Last.Id ) ;

   Set A ;

    By Id ;

       If Missing (Var) Then Continue ;

      Mcount = Sum (Mcount, 1) ;

       Prod = Prod * Var ;

       Sum = Sum (Sum, Var) ;

       End ;

       Mean = Sum / Mcount ;

Run ;

Message was edited by: Karun Diri Also I feel the keep statement would have been good with the set statement rather than the Data B Any suggestions??


Accepted Solutions
Solution
‎10-05-2012 03:59 PM
PROC Star
Posts: 7,363

Re: Dow Loop-Basics

MyBad!  I had two step number 4s.  I meant for you to run the following in one run .. not try to run it in parts:

data a;

  input id var;

  cards;

1 10

1 .

1 20

2 .

2 20

2 30

;

Data B ( Keep = Id Prod Sum Count Mean) ;

  Prod = 1 ;

  Do Count = 1 By 1 Until ( Last.Id ) ;

    Set A ;

    By Id ;

putlog "(1)" _all_;

    If Missing (Var) Then Continue ;

putlog "(2)" _all_;

    Mcount = Sum (Mcount, 1) ;

    Prod = Prod * Var ;

putlog "(3)" _all_;

    Sum = Sum (Sum, Var) ;

putlog "(4)" _all_;

  End ;

  Mean = Sum / Mcount ;

putlog "(5)" _all_;

Run ;

You should end up reading all six records, but only reaching step 5 twice and only outputting two records.

However, I agree with Rick, that this might be too complex for you at this point.  If you can answer your own questions after running the above code (that is to say, as is, not in parts), and answer your own questions by looking at what the log shows, you will at least learned a valuable lesson.

View solution in original post


All Replies
PROC Star
Posts: 7,363

Re: Dow Loop-Basics

Karun,

I think the best way to learn what is going on is to attempt the solution with some data, add some putlog statements in the code so that you can see what is happening within the PDV, and then attempt to answer your own questions.

E.g., try the following and let us know if you can answer your own questions:

data a;

  input id var;

  cards;

1 10

1 .

1 20

2 .

2 20

2 30

;

Data B ( Keep = Id Prod Sum Count Mean) ;

  Prod = 1 ;

  Do Count = 1 By 1 Until ( Last.Id ) ;

    Set A ;

    By Id ;

putlog "(1)" _all_;

    If Missing (Var) Then Continue ;

putlog "(2)" _all_;

    Mcount = Sum (Mcount, 1) ;

    Prod = Prod * Var ;

putlog "(3)" _all_;

    Sum = Sum (Sum, Var) ;

putlog "(4)" _all_;

  End ;

  Mean = Sum / Mcount ;

putlog "(5)" _all_;

Run ;

Super Contributor
Posts: 1,040

Re: Dow Loop-Basics

Sorry about that.

I figured it out.

I first ran it until putlog "(1)" _all_;

WARNING: The variable Sum in the DROP, KEEP, or RENAME list has never been referenced.

WARNING: The variable Mean in the DROP, KEEP, or RENAME list has never been referenced.

(1)Prod=1 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 _ERROR_=0 _N_=1

(1)Prod=1 Count=2 Last.Id=0 id=1 var=. FIRST.id=0 _ERROR_=0 _N_=1

(1)Prod=1 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 _ERROR_=0 _N_=1

(1)Prod=1 Count=1 Last.Id=0 id=2 var=. FIRST.id=1 _ERROR_=0 _N_=2

(1)Prod=1 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 _ERROR_=0 _N_=2

(1)Prod=1 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 _ERROR_=0 _N_=2

Compared that output with running until putlog "(2)" _all_;

WARNING: The variable Sum in the DROP, KEEP, or RENAME list has never been referenced.

WARNING: The variable Mean in the DROP, KEEP, or RENAME list has never been referenced.

(2)Prod=1 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 _ERROR_=0 _N_=1

(2)Prod=1 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 _ERROR_=0 _N_=1

(2)Prod=1 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 _ERROR_=0 _N_=2

(2)Prod=1 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 _ERROR_=0 _N_=2

Putlog(2) output will not have var=. because continue passes to the bottom of the loop directly if the var=.

So of the 6 obs we now have only 4 observation

IS that right?

Thanks

Solution
‎10-05-2012 03:59 PM
PROC Star
Posts: 7,363

Re: Dow Loop-Basics

MyBad!  I had two step number 4s.  I meant for you to run the following in one run .. not try to run it in parts:

data a;

  input id var;

  cards;

1 10

1 .

1 20

2 .

2 20

2 30

;

Data B ( Keep = Id Prod Sum Count Mean) ;

  Prod = 1 ;

  Do Count = 1 By 1 Until ( Last.Id ) ;

    Set A ;

    By Id ;

putlog "(1)" _all_;

    If Missing (Var) Then Continue ;

putlog "(2)" _all_;

    Mcount = Sum (Mcount, 1) ;

    Prod = Prod * Var ;

putlog "(3)" _all_;

    Sum = Sum (Sum, Var) ;

putlog "(4)" _all_;

  End ;

  Mean = Sum / Mcount ;

putlog "(5)" _all_;

Run ;

You should end up reading all six records, but only reaching step 5 twice and only outputting two records.

However, I agree with Rick, that this might be too complex for you at this point.  If you can answer your own questions after running the above code (that is to say, as is, not in parts), and answer your own questions by looking at what the log shows, you will at least learned a valuable lesson.

Super Contributor
Posts: 1,040

Re: Dow Loop-Basics

Thats great . I was running in parts just to see what happens at each step.

I already noted the difference between step1 and step2.

I agree with you both on this stuff looking complex to me. But i dont know for good or bad I have the attitude of going into the very depths even though its not required for me right now.

I see that most of the people post Answers on Dow loop. So I wanted to learn this stuff.

Thanks for making me walk through the basic concepts.....

Cheers

PROC Star
Posts: 7,363

Re: Dow Loop-Basics

Good!  When you are ready, post the answers to your original questions.  BTW, I find that taking advantage of the putlog _all_ statement is an excellent way to gain an understanding of how the datastep functions .. especially when one adds the complexities of a dow-type loop to the process.

Super Contributor
Posts: 1,040

Re: Dow Loop-Basics

Yes I will. Thanks once again

Super Contributor
Posts: 1,040

Re: Dow Loop-Basics

Thanks a lot . I will frame the answer but it will be based on your example........and i will explain the put log  with what is happening in every step. Will post it later today.

Cheers

Super Contributor
Posts: 1,040

Re: Dow Loop-Basics

*The following is the log for the dow loop by using putlog in the code and knowing what happens at each stage.

Arthurs example dataset above is used for analysis

Please let me know if I went wrong

I am worried abt the explanation I gave for Putlog(3).......PLease correct

data a;

  input id var;

  cards;

1 10

1 .

1 20

2 .

2 20

2 30

;

Putlog 1

*For the datastep the PDV is set up at the time of compilation with all the information for the variables as well as the automatic variables(_n_ and _error_). Here missing variables are also included.


20   Data B ( Keep = Id Prod Sum Count Mean) ;
21
22     Prod = 1 ;
23
24     Do Count = 1 By 1 Until ( Last.Id ) ;
25
26       Set A ;
27
28       By Id ;
29
30   putlog "(1)" _all_;
31   END;
32   RUN;

(1)Prod=1 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 _ERROR_=0 _N_=1
(1)Prod=1 Count=2 Last.Id=0 id=1 var=. FIRST.id=0 _ERROR_=0 _N_=1
(1)Prod=1 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 _ERROR_=0 _N_=1
(1)Prod=1 Count=1 Last.Id=0 id=2 var=. FIRST.id=1 _ERROR_=0 _N_=2
(1)Prod=1 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 _ERROR_=0 _N_=2
(1)Prod=1 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 _ERROR_=0 _N_=2

Putlog 2

In this part of the code below since the continue is used if var is missing the contol transfrs the loop to the end statement.

That is why in this stage we have all the information for only the variables with non missing var;


Data B ( Keep = Id Prod Sum Count Mean) ;
34
35     Prod = 1 ;
36
37     Do Count = 1 By 1 Until ( Last.Id ) ;
38
39       Set A ;
40
41       By Id ;
42
43
44
45
     If Missing (Var) Then Continue ;
47
48   putlog "(2)" _all_;
49   END;
50   RUN;

(2)Prod=1 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 _ERROR_=0 _N_=1
(2)Prod=1 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 _ERROR_=0 _N_=1
(2)Prod=1 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 _ERROR_=0 _N_=2
(2)Prod=1 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 _ERROR_=0 _N_=2

Putlog 3

When we use a by group with in the DOW loop SAS keeps track of the DATA within each BY group.

Meaning prod value in this step below  is retained in the group now  when it has to have a value of 1 (we assigned at the beginning and is supposed to retain usually since this variable is being created in this datastep and not coming from dataset A) each time...!!!!!!!!!


51   Data B ( Keep = Id Prod Sum Count Mean) ;
52
53     Prod = 1 ;
54
55     Do Count = 1 By 1 Until ( Last.Id ) ;
56
57       Set A ;
58
59       By Id ;
60
61   If Missing (Var) Then Continue ;
62   Mcount = Sum (Mcount, 1) ;
63
64       Prod = Prod * Var ;
65
66   putlog "(3)" _all_;
67   END;
68   RUN;

(3)Prod=10 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 Mcount=1 _ERROR_=0 _N_=1
(3)Prod=200 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 Mcount=2 _ERROR_=0 _N_=1
(3)Prod=20 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 Mcount=1 _ERROR_=0 _N_=2
(3)Prod=600 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 Mcount=2 _ERROR_=0 _N_=2

Putlog 4

Now in each group the sum value has the cumulative of var because of  the stmnt

sum=sum(sum,var);


69   Data B ( Keep = Id Prod Sum Count Mean) ;
70
71     Prod = 1 ;
72
73     Do Count = 1 By 1 Until ( Last.Id ) ;
74
75       Set A ;
76
77       By Id ;
78
79   If Missing (Var) Then Continue ;
80   Mcount = Sum (Mcount, 1) ;
81
82       Prod = Prod * Var ;
83   Sum = Sum (Sum, Var) ;
84
85   putlog "(4)" _all_;
86   END;
87   RUN;

(4)Prod=10 Count=1 Last.Id=0 id=1 var=10 FIRST.id=1 Mcount=1 Sum=10 _ERROR_=0 _N_=1
(4)Prod=200 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 Mcount=2 Sum=30 _ERROR_=0 _N_=1
(4)Prod=20 Count=2 Last.Id=0 id=2 var=20 FIRST.id=0 Mcount=1 Sum=20 _ERROR_=0 _N_=2
(4)Prod=600 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 Mcount=2 Sum=50 _ERROR_=0 _N_=2

Putlog 5

In the step below 30 is the final sum for ID 1 group and 50 is trhe sum for ID 2 group;

since our data is divided into 2 groups by the BY ID statement

At this point we went Outside the loop and said  Mean = Sum / Mcount ;

Final sum's for each BY variable is divided by the final Mcount  and finallly end up with the 2 observations having value of

Mean=15 for group ID1

Mean=25  for group ID2

88   Data B ( Keep = Id Prod Sum Count Mean) ;
89
90     Prod = 1 ;
91
92     Do Count = 1 By 1 Until ( Last.Id ) ;
93
94       Set A ;
95
96       By Id ;
97
98   If Missing (Var) Then Continue ;
99   Mcount = Sum (Mcount, 1) ;
100
101      Prod = Prod * Var ;
102  Sum = Sum (Sum, Var) ;
103  END;
104
105                 /*notice that the mean stmnt is after the END stmnt*/
106
107    Mean = Sum / Mcount ;
108
109  putlog "(5)" _all_;
110
111  Run ;

(5)Prod=200 Count=3 Last.Id=1 id=1 var=20 FIRST.id=0 Mcount=2 Sum=30 Mean=15 _ERROR_=0 _N_=1
(5)Prod=600 Count=3 Last.Id=1 id=2 var=30 FIRST.id=0 Mcount=2 Sum=50 Mean=25 _ERROR_=0 _N_=2

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 296 views
  • 0 likes
  • 2 in conversation