DATA Step, Macro, Functions and more

use of plus(+) after a variable name?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

use of plus(+) after a variable name?

Hi, here's my code:

I can understand them all but the line with totsal+ salary;

 

Where did that totsal+ come from? and what does it do?

even this was not declared, the program works fine.

Is it something like: totsal = totsal + salary?

 

DATA employee;
	INPUT fname $ age;
	datalines;
	Bruce 30
	Dan 40
	;
RUN;

DATA salary;
	INPUT name $ salary;
	datalines;
	Bruce 25000
	Bruce 35000
	Dan 25000
	;
RUN;

data empdata;
	merge employee salary(rename=name=fname);
	by fname;
	totsal+ salary;
run;

Accepted Solutions
Solution
3 weeks ago
Respected Advisor
Posts: 3,247

Re: use of plus(+) after a variable name?

[ Edited ]
Posted in reply to jimmychoi

This is a summation command. For each record in the data set, salary is added to the previous value of totsal (and totsal is zero at the start of the data set).


Example:

 

Salary     Totsal

100             100

125             225

70               295

 

Is it something like: totsal = totsal + salary?

 

No, that's different. That doesn't work unless somehow totsal has been assigned a value earlier in the data set for each record.

--
Paige Miller

View solution in original post


All Replies
Solution
3 weeks ago
Respected Advisor
Posts: 3,247

Re: use of plus(+) after a variable name?

[ Edited ]
Posted in reply to jimmychoi

This is a summation command. For each record in the data set, salary is added to the previous value of totsal (and totsal is zero at the start of the data set).


Example:

 

Salary     Totsal

100             100

125             225

70               295

 

Is it something like: totsal = totsal + salary?

 

No, that's different. That doesn't work unless somehow totsal has been assigned a value earlier in the data set for each record.

--
Paige Miller
Frequent Contributor
Posts: 112

Re: use of plus(+) after a variable name?

Posted in reply to jimmychoi

This construct is called "the SUM statement". It's equivalent to 2 statements:

  retain totsal 0 ; 
  totsal = totsal + salary ;

Thus, totsal is initialized to 0 at compile time, and the value of totsal is not set to a missing value every time the new iteration of the DATA step intrinsic loop begins (i.e. at the top of the step). This way, totsal is incremented by salary in each iteration. Without the RETAIN statement (or its implied equivalent in the SUM statement), totsal would be initially missing, and the + operator would try adding a non-missing value of salary to a missing value, which would result in a missing value, plus you'd get the "missing values were generated" message in the log. If, instead of the RETAIN above, you simply initialize total to 0, i.e. code:

 

 

  totsal = 0 ;
  totsal = totsal + salary ;

that would not work as you want, either. This is because in the first iteration of the implied loop, the summation would result in 0+salary=salary. However, in the very next iteration, totsal will be set to a missing value at the top of the step, then assigned 0, and as a result, you'd get totsal = salary for each output record. This behavior can be altered only if you take explicit control of the file-reading loop and thus prevent program control from ever reaching the top of the step:

data empdata ;
  totsal = 0 ;
  do until (eof) ;
    merge employee salary(rename=(name=fname)) end=eof ;
    by fname ;
    totsal  = totsal + salary ;
    output ;
  end ;
  stop ;
run;

The reason it works is that the initialization is now located outside the file-reading loop, and all processing is finished before program control can reach the top of the step again. This construct is a specific form of what is known as the DoW-loop. It has a number of advantages compared to doing everything inside the implied loop. Though in this case it offers none, thinking of how it works is useful to get a better idea of how the DATA step operates in general.  

 

Paul D.

 

 

 

 

  

Respected Advisor
Posts: 3,864

Re: use of plus(+) after a variable name?


@hashman wrote:

This construct is called "the SUM statement". It's equivalent to 2 statements:

  retain totsal 0 ; 
  totsal = totsal + salary ;

 

  What about missing?

40   data;
41      input salary;
42      retain totsal totsal2 0 ;
43      totsal = totsal + salary;
44      totsal2 = sum(totsal2,salary);
45      totsal3 + salary;
46      cards;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      3 at 43:20
NOTE: The data set WORK.DATA1 has 4 observations and 4 variables.

Capture.PNG

 


 

Frequent Contributor
Posts: 112

Re: use of plus(+) after a variable name?

Posted in reply to data_null__

John,

 

You're right about that. Assuming that salary is not missing is making assumptions about data.

I stand corrected. The definition of the SUM statement should be reformulated as that it is equivalent to the statements:

  retain var 0 ;
  var = sum (var, expression) ;

Thanks for pointing that out. Nicely dovetails into the point of the benefit of being contradicted I made one post eariler.

 

Best

Paul D.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 84 views
  • 1 like
  • 4 in conversation