Hi All,
I have the following dataset:
patid | y | wt | bmi |
1 | -4 | 130 | 26 |
1 | -2 | 120 | 25 |
1 | 0 | 100 | 24 |
1 | 1 | 99 | 22 |
2 | -1 | 120 | 24 |
2 | 0 | 110 | 23 |
2 | 2 | 98 | 21 |
3 | -3 | 155 | 25 |
3 | 0 | 120 | 23 |
4 | -8 | 130 | 23 |
4 | -7 | 112 | 24 |
5 | -9 | 145 | 25 |
5 | -4 | 123 | 24 |
5 | 1 | 100 | 23 |
5 | 3 | 85 | 21 |
6 | -8 | 130 | 25 |
6 | 1 | 85 | 20 |
I want to extract the post-wt and post-BMI from WT and BMI that is at y = 0 or closest to y toward negative values (first wt/bmi observation before 0), and create the following table:
patid | y | wt | bmi | post_wt | Post_BMI |
1 | -4 | 130 | 26 | ||
1 | -2 | 120 | 25 | ||
1 | 0 | 100 | 24 | 100 | 24 |
1 | 1 | 99 | 22 | ||
2 | -1 | 120 | 24 | ||
2 | 0 | 110 | 23 | 110 | 23 |
2 | 2 | 98 | 21 | ||
3 | -3 | 155 | 25 | ||
3 | 0 | 120 | 23 | 120 | 23 |
4 | -8 | 130 | 23 | ||
4 | -7 | 112 | 24 | 120 | 23 |
5 | -9 | 145 | 25 | ||
5 | -4 | 123 | 24 | 123 | 24 |
5 | 1 | 100 | 23 | ||
5 | 3 | 85 | 21 | ||
6 | -8 | 130 | 25 | 130 | 25 |
6 | 1 | 85 | 20 |
The SAS code for the "have" table:
data have;
input patid y wt bmi;
datalines;
1 -4 130 26
1 -2 120 25
1 0 100 24
1 1 99 22
2 -1 128 24
2 0 110 23
2 2 98 21
3 -3 155 25
3 0 120 23
4 -8 130 24
4 -7 112 23
5 -9 145 25
5 -4 123 24
5 1 100 23
5 3 85 21
6 0 90 21
6 1 85 20
;
run;
Thanks,
Sandyzman1
I think you want the values for any record wit y=0. But if a a PATID has no records with y=0, then you want the first record with the "least negative" value of y. (So what if a PATID only has positive y values).
Your DATA HAVE program data does not match the starting data in your table. I use the DATA step programming you provided:
data have;
input patid y wt bmi;
datalines;
1 -4 130 26
1 -2 120 25
1 0 100 24
1 1 99 22
2 -1 128 24
2 0 110 23
2 2 98 21
3 -3 155 25
3 0 120 23
4 -8 130 24
4 -7 112 23
5 -9 145 25
5 -4 123 24
5 1 100 23
5 3 85 21
6 0 90 21
6 1 85 20
;
run;
data want (drop=_:);
set have (in=firstpass) have (in=secondpass);
by patid ;
retain _save_y;
if first.patid then call missing(_save_y);
if firstpass=1 and y<=0 then _save_y=max(y,_save_y);
if secondpass;
if y=_save_y then do;
post_wt=wt;
post_bmi=bmi;
end;
run;
The technique here is to pass through each PATID twice, first time to identify which value of Y is the selected record (identified with (_SAVE_Y). The second time to test for Y=_SAVE_Y and then copy WT and BMI.
This program assumes that the optimal Y value never occurs more than once for any PATID.
I think you want the values for any record wit y=0. But if a a PATID has no records with y=0, then you want the first record with the "least negative" value of y. (So what if a PATID only has positive y values).
Your DATA HAVE program data does not match the starting data in your table. I use the DATA step programming you provided:
data have;
input patid y wt bmi;
datalines;
1 -4 130 26
1 -2 120 25
1 0 100 24
1 1 99 22
2 -1 128 24
2 0 110 23
2 2 98 21
3 -3 155 25
3 0 120 23
4 -8 130 24
4 -7 112 23
5 -9 145 25
5 -4 123 24
5 1 100 23
5 3 85 21
6 0 90 21
6 1 85 20
;
run;
data want (drop=_:);
set have (in=firstpass) have (in=secondpass);
by patid ;
retain _save_y;
if first.patid then call missing(_save_y);
if firstpass=1 and y<=0 then _save_y=max(y,_save_y);
if secondpass;
if y=_save_y then do;
post_wt=wt;
post_bmi=bmi;
end;
run;
The technique here is to pass through each PATID twice, first time to identify which value of Y is the selected record (identified with (_SAVE_Y). The second time to test for Y=_SAVE_Y and then copy WT and BMI.
This program assumes that the optimal Y value never occurs more than once for any PATID.
With a SAS data step looking back is easier to code for than looking ahead. If you sort your data by descending Y then using the lag() function will allow for such a look-back.
data have;
input patid y wt bmi;
datalines;
1 -4 130 26
1 -2 120 25
1 0 100 24
1 1 99 22
2 -1 128 24
2 0 110 23
2 2 98 21
3 -3 155 25
3 0 120 23
4 -8 130 24
4 -7 112 23
5 -9 145 25
5 -4 123 24
5 1 100 23
5 3 85 21
6 0 90 21
6 1 85 20
;
proc sort data=have out=inter;
by patid descending y;
run;
data want;
set inter;
by patid descending y;
post_flg=0;
if y<=0 and lag(y)>0 and patid=lag(patid) then post_flg=1;
/* not sure what should happen if there is only one row or */
/* if there are only positive values for Y */
else if first.patid and last.patid and y<=0 then post_flg=1;
/* now that the rows have been identified do whatever else you need */
if post_flg=1 then
do;
post_wt=wt;
post_bmi=bmi;
end;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.