Doing some adding and replacing in my data set

Occasional Contributor
Posts: 6

Doing some adding and replacing in my data set

I'd like to create a new data set that sums together all the responsetime readings at trial_line 4 and 5. So, in the sample data below, nothing would change for the variables subject and item, but trial_line would go (1, 2, 3, 4, 6, 1, 2, 3, 4, 6) and responsetime would go (395, 409, 398, 766, 401, 343, 343, 343, 679, 409) (bolded for emphasis). I think this is fairly simple to do, but I am unsure how to go about it.


data reading2;

input subject $ item $ trial_line responsetime;


438    5   1    395   
438    5    2    409   
438    5    3    398   
438    5    4   380   
438    5    5    386   
438    6    6   401   
438   6    1    343  
438    6    2    343   
438    6    3    343   
438    6    4    311   
438    6    5    368

438    6    6    409


Super User
Posts: 12,712

Re: Doing some adding and replacing in my data set

Posted in reply to PeteCthulu

There are several approaches to this type of problem. Questions to answer first:

Do you have other variables in the data set than those shown? If so what to do with them?

Do you always have a 5 and 6 or is it possible that you have a 5 without a 6? Do ever have other values for trial_line other than 1 through 6?

Is the data actually sorted by subject,  item and trial_line?


There are ways depending on the answers above that would use the Retain or Lag and Output statements but I am fond of using a custom format to create groups when the rule is simple and then a summary procedure to the accumulation.


proc format;
value trial_line
4,5 = 4
proc summary data=have nway;
   class subject item trial_line ;
   format trial_line trial_line.;
   var responsetime;
   output out=want (drop=_type_ _freq_) sum=;

The format trial_line will group values of 4 and 5 under a single display value of 4 and use that grouping when used as a class variable in proc summary (or proc means ), class variables are used for grouping and the variable(s) to sum (or average or what have you) is on the var statement. The single statistic sum= says that the variable is summed but the result has the same name. If multiple statistics are requested you need to either specify the name of the result or use an option like /autoname to generate names with the statistic abbreviation added. The drop statement removes variables that would be added to the data set that indication properties of the values, Nway says to only have the output for combinations of all the class variables. Proc summary can generate different combinations of summaries.


Ask a Question
Discussion stats
  • 1 reply
  • 2 in conversation