BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jimmychoi
Obsidian | Level 7

Hi, here's my code:

I can understand them all but the line with totsal+ salary;

 

Where did that totsal+ come from? and what does it do?

even this was not declared, the program works fine.

Is it something like: totsal = totsal + salary?

 

DATA employee;
	INPUT fname $ age;
	datalines;
	Bruce 30
	Dan 40
	;
RUN;

DATA salary;
	INPUT name $ salary;
	datalines;
	Bruce 25000
	Bruce 35000
	Dan 25000
	;
RUN;

data empdata;
	merge employee salary(rename=name=fname);
	by fname;
	totsal+ salary;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

This is a summation command. For each record in the data set, salary is added to the previous value of totsal (and totsal is zero at the start of the data set).


Example:

 

Salary     Totsal

100             100

125             225

70               295

 

Is it something like: totsal = totsal + salary?

 

No, that's different. That doesn't work unless somehow totsal has been assigned a value earlier in the data set for each record.

--
Paige Miller

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

This is a summation command. For each record in the data set, salary is added to the previous value of totsal (and totsal is zero at the start of the data set).


Example:

 

Salary     Totsal

100             100

125             225

70               295

 

Is it something like: totsal = totsal + salary?

 

No, that's different. That doesn't work unless somehow totsal has been assigned a value earlier in the data set for each record.

--
Paige Miller
hashman
Ammonite | Level 13

This construct is called "the SUM statement". It's equivalent to 2 statements:

  retain totsal 0 ; 
  totsal = totsal + salary ;

Thus, totsal is initialized to 0 at compile time, and the value of totsal is not set to a missing value every time the new iteration of the DATA step intrinsic loop begins (i.e. at the top of the step). This way, totsal is incremented by salary in each iteration. Without the RETAIN statement (or its implied equivalent in the SUM statement), totsal would be initially missing, and the + operator would try adding a non-missing value of salary to a missing value, which would result in a missing value, plus you'd get the "missing values were generated" message in the log. If, instead of the RETAIN above, you simply initialize total to 0, i.e. code:

 

 

  totsal = 0 ;
  totsal = totsal + salary ;

that would not work as you want, either. This is because in the first iteration of the implied loop, the summation would result in 0+salary=salary. However, in the very next iteration, totsal will be set to a missing value at the top of the step, then assigned 0, and as a result, you'd get totsal = salary for each output record. This behavior can be altered only if you take explicit control of the file-reading loop and thus prevent program control from ever reaching the top of the step:

data empdata ;
  totsal = 0 ;
  do until (eof) ;
    merge employee salary(rename=(name=fname)) end=eof ;
    by fname ;
    totsal  = totsal + salary ;
    output ;
  end ;
  stop ;
run;

The reason it works is that the initialization is now located outside the file-reading loop, and all processing is finished before program control can reach the top of the step again. This construct is a specific form of what is known as the DoW-loop. It has a number of advantages compared to doing everything inside the implied loop. Though in this case it offers none, thinking of how it works is useful to get a better idea of how the DATA step operates in general.  

 

Paul D.

 

 

 

 

  

data_null__
Jade | Level 19

@hashman wrote:

This construct is called "the SUM statement". It's equivalent to 2 statements:

  retain totsal 0 ; 
  totsal = totsal + salary ;

 

  What about missing?

40   data;
41      input salary;
42      retain totsal totsal2 0 ;
43      totsal = totsal + salary;
44      totsal2 = sum(totsal2,salary);
45      totsal3 + salary;
46      cards;

NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      3 at 43:20
NOTE: The data set WORK.DATA1 has 4 observations and 4 variables.

Capture.PNG

 


 

hashman
Ammonite | Level 13

John,

 

You're right about that. Assuming that salary is not missing is making assumptions about data.

I stand corrected. The definition of the SUM statement should be reformulated as that it is equivalent to the statements:

  retain var 0 ;
  var = sum (var, expression) ;

Thanks for pointing that out. Nicely dovetails into the point of the benefit of being contradicted I made one post eariler.

 

Best

Paul D.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 728 views
  • 1 like
  • 4 in conversation