Hi All,
I have the following dataset:
patid | y | wt | bmi |
1 | -4 | 130 | 26 |
1 | -2 | 120 | 25 |
1 | 0 | 100 | 24 |
1 | 1 | 99 | 22 |
2 | -1 | 120 | 24 |
2 | 0 | 110 | 23 |
2 | 2 | 98 | 21 |
3 | -3 | 155 | 25 |
3 | 0 | 120 | 23 |
4 | -8 | 130 | 23 |
4 | -7 | 112 | 24 |
5 | -9 | 145 | 25 |
5 | -4 | 123 | 24 |
5 | 1 | 100 | 23 |
5 | 3 | 85 | 21 |
6 | -8 | 130 | 25 |
6 | 1 | 85 | 20 |
I want to extract the post-wt and post-BMI from WT and BMI that is at y = 0 or closest to y toward negative values (first wt/bmi observation before 0), and create the following table:
patid | y | wt | bmi | post_wt | Post_BMI |
1 | -4 | 130 | 26 | ||
1 | -2 | 120 | 25 | ||
1 | 0 | 100 | 24 | 100 | 24 |
1 | 1 | 99 | 22 | ||
2 | -1 | 120 | 24 | ||
2 | 0 | 110 | 23 | 110 | 23 |
2 | 2 | 98 | 21 | ||
3 | -3 | 155 | 25 | ||
3 | 0 | 120 | 23 | 120 | 23 |
4 | -8 | 130 | 23 | ||
4 | -7 | 112 | 24 | 120 | 23 |
5 | -9 | 145 | 25 | ||
5 | -4 | 123 | 24 | 123 | 24 |
5 | 1 | 100 | 23 | ||
5 | 3 | 85 | 21 | ||
6 | -8 | 130 | 25 | 130 | 25 |
6 | 1 | 85 | 20 |
The SAS code for the "have" table:
data have;
input patid y wt bmi;
datalines;
1 -4 130 26
1 -2 120 25
1 0 100 24
1 1 99 22
2 -1 128 24
2 0 110 23
2 2 98 21
3 -3 155 25
3 0 120 23
4 -8 130 24
4 -7 112 23
5 -9 145 25
5 -4 123 24
5 1 100 23
5 3 85 21
6 0 90 21
6 1 85 20
;
run;
Thanks,
Sandyzman1
I think you want the values for any record wit y=0. But if a a PATID has no records with y=0, then you want the first record with the "least negative" value of y. (So what if a PATID only has positive y values).
Your DATA HAVE program data does not match the starting data in your table. I use the DATA step programming you provided:
data have;
input patid y wt bmi;
datalines;
1 -4 130 26
1 -2 120 25
1 0 100 24
1 1 99 22
2 -1 128 24
2 0 110 23
2 2 98 21
3 -3 155 25
3 0 120 23
4 -8 130 24
4 -7 112 23
5 -9 145 25
5 -4 123 24
5 1 100 23
5 3 85 21
6 0 90 21
6 1 85 20
;
run;
data want (drop=_:);
set have (in=firstpass) have (in=secondpass);
by patid ;
retain _save_y;
if first.patid then call missing(_save_y);
if firstpass=1 and y<=0 then _save_y=max(y,_save_y);
if secondpass;
if y=_save_y then do;
post_wt=wt;
post_bmi=bmi;
end;
run;
The technique here is to pass through each PATID twice, first time to identify which value of Y is the selected record (identified with (_SAVE_Y). The second time to test for Y=_SAVE_Y and then copy WT and BMI.
This program assumes that the optimal Y value never occurs more than once for any PATID.
I think you want the values for any record wit y=0. But if a a PATID has no records with y=0, then you want the first record with the "least negative" value of y. (So what if a PATID only has positive y values).
Your DATA HAVE program data does not match the starting data in your table. I use the DATA step programming you provided:
data have;
input patid y wt bmi;
datalines;
1 -4 130 26
1 -2 120 25
1 0 100 24
1 1 99 22
2 -1 128 24
2 0 110 23
2 2 98 21
3 -3 155 25
3 0 120 23
4 -8 130 24
4 -7 112 23
5 -9 145 25
5 -4 123 24
5 1 100 23
5 3 85 21
6 0 90 21
6 1 85 20
;
run;
data want (drop=_:);
set have (in=firstpass) have (in=secondpass);
by patid ;
retain _save_y;
if first.patid then call missing(_save_y);
if firstpass=1 and y<=0 then _save_y=max(y,_save_y);
if secondpass;
if y=_save_y then do;
post_wt=wt;
post_bmi=bmi;
end;
run;
The technique here is to pass through each PATID twice, first time to identify which value of Y is the selected record (identified with (_SAVE_Y). The second time to test for Y=_SAVE_Y and then copy WT and BMI.
This program assumes that the optimal Y value never occurs more than once for any PATID.
With a SAS data step looking back is easier to code for than looking ahead. If you sort your data by descending Y then using the lag() function will allow for such a look-back.
data have;
input patid y wt bmi;
datalines;
1 -4 130 26
1 -2 120 25
1 0 100 24
1 1 99 22
2 -1 128 24
2 0 110 23
2 2 98 21
3 -3 155 25
3 0 120 23
4 -8 130 24
4 -7 112 23
5 -9 145 25
5 -4 123 24
5 1 100 23
5 3 85 21
6 0 90 21
6 1 85 20
;
proc sort data=have out=inter;
by patid descending y;
run;
data want;
set inter;
by patid descending y;
post_flg=0;
if y<=0 and lag(y)>0 and patid=lag(patid) then post_flg=1;
/* not sure what should happen if there is only one row or */
/* if there are only positive values for Y */
else if first.patid and last.patid and y<=0 then post_flg=1;
/* now that the rows have been identified do whatever else you need */
if post_flg=1 then
do;
post_wt=wt;
post_bmi=bmi;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.