I need to create a dataset on which there are two systolic blood pressure measurements taken from the same individual. I need to take only the lowest of the two measurements only if the measurements are taken within 4 weeks from each other. If the measurements are not within four weeks then the measurement is not taken. The names and date of birth are not real.
ID Name Date of Birth Systolic Blood Pressure Date of measurement
0001 abc 12/12/1965 145 8/10/2010
0002 abc 12/12/1965 139 8/20/2010
0003 def 8/30/1962 132 8/04/2010
0004 def 8/30/1962 137 8/21/2010
0005 ghi 3/12/1964 140 02/07/2010
0006 ghi 3/12/1964 129 03/17/2010
007 jkl 5/10/1963 127 01/07/2010
008 jkl 5/10/1963 121 02/13/2010
As can be seen, abc, def, ghi and jkl are four different persons. Each of them have two readings of systolic blood pressure. I need to write SAS program to take the lowest of the two measurements, only if the date of measurements are within four weeks from each other. If not, then they are not included. In this case the first two are seen to meet that criteria of four weeks while the last two does not.
Any help will be appreciated. Thanks in advance.
This may get you started:
proc summary data=have nway;
class name date_of_birth;
var Systolic_Blood_Pressure;
output out=temp (drop=_type_ _freq_) min=
idgroup (min(Systolic_Blood_Pressure) out[2]
(Systolic_Blood_Pressure id Date_of_measurement)=)
/autolabel autoname;
run;
This gets the two lowest values.
Since you don't guarantee there are two or more readings the deduplication may be a bit more.
You will have two variable in the output set above for each date of measurement.
It could be:
data want;
set temp;
where Date_of_measurement_1 - Date_of_measurement2 le 28;
run;
Thank you. Your code did help in getting started but I still need to get the desired outputs which are 1) first get the two measurements per person that are 12 weeks apart and then 2) get rid of the one that is the highest (and keep the lowest). I am playing around with this code though to get to the outputs.
Well, depends on your definitions, does it mean that the two observations are exactly 12 weeks apart? Will there only be 2 results, or many results, what happens if more than 2 results overlapping etc. Clearly define the logic needed first. If its just two and they are exact:
proc sql;
create table WANT as
select COALESCE(A.ID,B.ID) as ID,
...
case when A.SYSBP is not null and A.SYSBP < B.SYSBP then A.SYSBP
else B.SYSBP end as SYSBP
from HAVE A
full join HAVE B
on A.ID=B.ID
and A.DOB=B.DOB
and A.NAME=B.NAME
and ...
and B.MEAS=intnx('week',A.MEAS,12,'same');
quit;
Thank you. It is 4 weeks and not 12 weeks. But they need not be exactly 4 weeks apart. The dates could be within 4 weeks. There will be only two results, and not multiple. Your code is helpful if it needed an exact time difference (4 weeks in this case). thanks again.
12 week? earlier it was 4 weeks...
The data set I show has 2 minimum values and the associated dates. I thought you might be able to determine which ones were of interest by comparing the dates.
I think you are possibly hung up on "deduplicate first" then find min as the order to accomplish the task. Sometimes the order may work better in a different sequence.
Thank you. It is 4 weeks. Sorry for the typo. I will try the different sequence as you suggested.
You need to post the output you need according to your sample data. Otherwise We just are guessing what you want.
data have; input ID $ Name $ DateofBirth : mmddyy12. SystolicBloodPressure Dateofmeasurement : mmddyy12.; format DateofBirth Dateofmeasurement mmddyy10.; cards; 0001 abc 12/12/1965 145 8/10/2010 0002 abc 12/12/1965 139 8/20/2010 0003 def 8/30/1962 132 8/04/2010 0004 def 8/30/1962 137 8/21/2010 0005 ghi 3/12/1964 140 02/07/2010 0006 ghi 3/12/1964 129 03/17/2010 007 jkl 5/10/1963 127 01/07/2010 008 jkl 5/10/1963 121 02/13/2010 ; run; proc sql; create table want as select * from have group by Name having range(Dateofmeasurement) le 28 and SystolicBloodPressure=min(SystolicBloodPressure); quit;
Xia Keshan
Thank you much. Here is the desired final output:
id name DateofBirth SystolicBloodPressire DateofMeasurement
0001 abc 12/12/1965 139 8/20/2010
0003 def 8/30/1962 132 8/04/2010
To get this output there are two steps I guess. The first step will remove all the duplicates that have blood pressure measurements that are beyond 4 weeks apart from each other and keep those duplicates' measurements that are
within 4 weeks.
The second step would then give the above output where only the lowest of the measurement is kept.
Your SQL code will get me the second step i.e. the lowest of the two duplicate's measurement, but may not get me the first desired output which is list of duplicates that have measurement within 4 weeks and that are beyond four weeks.
No. I can get you the first desired output . The question is what output you are looking for ?
proc sql;
create table want as
select *
from have
group by Name
having range(Dateofmeasurement) le 28 ;
quit;
Thank you very much. This was helpful.
Suppose you have a patient with 16 readings all exactly 3 weeks apart. Which are kept by your deduplication rule?
Also you did not specify that your output requirement was a list of duplicates. Plus you need to provide a much more comprehensive example to show what you mean by
duplicates that have measurement within 4 weeks and that are beyond four weeks.
because that at face value is a null set.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.