How to do row calculation by group

stataq · Posted 03-26-2024 02:37 PM

Hello

I want to do row calculation with the following rule:

The calculation is done per ID.

IF Y(Y= score, std) ) in time 1 is null and Y in time 2 and 3 are not null then Y in time 1 will be equal to Y in time 2* time 3.

Otherwise Y remain as is.

What is the way to do it please?

Data have;
input ID time score std;
cards;
111 1 . .
111 2 . .
111 3 7 5
222 1 4 6
222 2 5 7
222 3 4 8
333 1 . .
333 2 6 2
333 3 7 4
444 1 . .
444 2 6 4
444 3 7 2
;
Run;

expected output will looks like below:

ID	time	score	std
111	1	.	.
111	2	.	.
111	3	7	5
222	1	4	6
222	2	5	7
222	3	4	8
333	1	42	8
333	2	6	2
333	3	7	4
444	1	42	8
444	2	6	4
444	3	7	2

Tom · Posted 03-26-2024 03:15 PM

Data is not structed properly for that type of calculation, but that is easy to fix with PROC TRANSPOSE.

data have;
  input ID time score std;
cards;
111 1 . .
111 2 . .
111 3 7 5
222 1 4 6
222 2 5 7
222 3 4 8
333 1 . .
333 2 6 2
333 3 7 4
444 1 . .
444 2 6 4
444 3 7 2
;

proc transpose data=have out=haveT name=MEASURE prefix=Y;
  by id;
  id time;
  var score std;
run;

proc print;
run;

data want;
  set havet ;
  y1 = coalesce(y1,y2*y3);
run;

proc print;
run;

Results:

stataq · Posted 03-26-2024 03:59 PM

How can I change the data back to old arrangement?

mkeintz · Posted 03-26-2024 09:11 PM

If you want to preserve the original structure then you have to pass through each ID group twice, the first time to store the SCORE and STD for time 2 and time 3, the second pass to calculate a new SCORE and STD for time 1 if appropriate. The data are output during the second pass:

data have;
  input ID time score std;
cards;
111 1 . .
111 2 . .
111 3 7 5
222 1 4 6
222 2 5 7
222 3 4 8
333 1 . .
333 2 6 2
333 3 7 4
444 1 . .
444 2 6 4
444 3 7 2
run;

data want ;
  set have (in=firstpass)  have (in=secondpass);
  by id;
  array sc {2:3} _temporary_;
  array st {2:3} _temporary_;

  if first.id then call missing (of sc{*}, of st{*});
  if firstpass=1 and (2<=time<=3) then do;
    sc{time}=score;
    st{time}=std;
  end;

  if secondpass;
  if time=1 then do;
    if missing(score) and nmiss(of sc{*})=0 then score=sc{2}*sc{3};  
    if missing(std)   and nmiss(of st{*})=0 then std=st{2}*st{3};  
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 03-26-2024 10:24 PM

data have;
  input ID time score std;
cards;
111 1 . .
111 2 . .
111 3 7 5
222 1 4 6
222 2 5 7
222 3 4 8
333 1 . .
333 2 6 2
333 3 7 4
444 1 . .
444 2 6 4
444 3 7 2
;

data want;
 merge have 
       have(keep=ID score std rename=(id=id1 score=score1 std=std1) firstobs=2) 
       have(keep=ID score std rename=(id=id2 score=score2 std=std2) firstobs=3);
if ( id ne lag(id) and missing(score) and missing(std) )    and
   ( id eq id1 and not missing(score1) and not missing(std1) )   and
   ( id eq id2 and not missing(score2) and not missing(std2) ) then do; 
   score=score1*score2; std=std1*std2;
end;
drop id1 id2 score1 score2 std1 std2;
run;

How to do row calculation by group

Re: How to do row calculation by group

Re: How to do row calculation by group

Re: How to do row calculation by group

Re: How to do row calculation by group

How to do row calculation by group

Re: How to do row calculation by group

Re: How to do row calculation by group

Re: How to do row calculation by group

Re: How to do row calculation by group

Click image to register for webinar

Classroom Training Available!