data vs ;
input patient cpevent$ vstest$ vsren vsorresu$;
cards ;
100 base pr 78 min
100 scrn hr 72 /min
100 week1 sbp 70 mmhg
100 week7 dbp 110 mm
100 week21 weight 75 kg
100 fwp height 120 kg
100 base pr 78 min
100 scrn hr 79 /mn
100 week1 sbp 70 mmhg
100 week7 dbp 110 mm
100 week21 weight 80 pounds
100 fwp height 75 kg
;
Condition below
If VSTEST having same VSREN and VSORRESU for 4 consecutive visits then highlight those with flag as
"dulipcate record"
Based on above condition and data , create a flag
Please anyone help me .....
Below uses amended sample data so it contains at least one case with 4 repeated values. I've also added a variable visit_date. I do hope that your real data contains sufficient data to ensure your analysis runs against data in the required sort order.
It would also help if you show us the desired result as this will clarify which rows you would want flagged (all of them or only from the 4th repetition onwards).
data have ;
infile datalines dsd dlm=' ' truncover;
input patient cpevent$ vstest$ vsren vsorresu$;
visit_date='01jan2024'd+_n_;
format visit_date date9.;
cards ;
100 base pr 78 min
100 base pr 78 min
100 scrn hr 72 /min
100 week1 sbp 70 mmhg
100 week7 dbp 110 mm
100 week21 weight 75 kg
100 fwp height 120 kg
100 base pr 78 min
100 base4 pr 78 min
100 base1 pr 78 min
100 base6 pr 78 min
100 base pr 78 min
100 scrn hr 79 /mn
100 week1 sbp 70 mmhg
100 week7 dbp 110 mm
100 week21 weight 80 pounds
100 fwp height 75 kg
;
data want;
set have;
by patient visit_date;
if not first.patient and lag(vsren)=vsren and lag(vsorresu)=vsorresu then occurrence_cnt+1;
else occurrence_cnt=1;
multi_flg= (occurrence_cnt>=4);
run;
proc print data=want;
run;
Please update your question to show the output you would want from the sample data you posted. Also please show the code you have tried. Both of those will help people help you. You might want to consider updating your sample data as well. If I'm understanding what you want, the sample data will not result in any duplicates being flagged.
Below uses amended sample data so it contains at least one case with 4 repeated values. I've also added a variable visit_date. I do hope that your real data contains sufficient data to ensure your analysis runs against data in the required sort order.
It would also help if you show us the desired result as this will clarify which rows you would want flagged (all of them or only from the 4th repetition onwards).
data have ;
infile datalines dsd dlm=' ' truncover;
input patient cpevent$ vstest$ vsren vsorresu$;
visit_date='01jan2024'd+_n_;
format visit_date date9.;
cards ;
100 base pr 78 min
100 base pr 78 min
100 scrn hr 72 /min
100 week1 sbp 70 mmhg
100 week7 dbp 110 mm
100 week21 weight 75 kg
100 fwp height 120 kg
100 base pr 78 min
100 base4 pr 78 min
100 base1 pr 78 min
100 base6 pr 78 min
100 base pr 78 min
100 scrn hr 79 /mn
100 week1 sbp 70 mmhg
100 week7 dbp 110 mm
100 week21 weight 80 pounds
100 fwp height 75 kg
;
data want;
set have;
by patient visit_date;
if not first.patient and lag(vsren)=vsren and lag(vsorresu)=vsorresu then occurrence_cnt+1;
else occurrence_cnt=1;
multi_flg= (occurrence_cnt>=4);
run;
proc print data=want;
run;
The only way this makes any sense if you process it BY the PATIENT and the VSTEST (vital sign test name). Otherwise you might be comparing Systolic BP readings to Diastolic BP readings and flagging false repetitions.
So you will need to sort your data, which means you need a sortable variable. Normally that would be some type of visit date or reading date for each observation. For the purpose of an example let's just assume your example data is already in order and just number the observations to create an ordering variable, let's call it ROW.
data vs ;
row+1;
input patient cpevent$ vstest$ vsren vsorresu$;
cards ;
100 base pr 78 min
100 scrn hr 72 /min
100 week1 sbp 70 mmhg
100 week7 dbp 110 mm
100 week21 weight 75 kg
100 fwp height 120 kg
100 base pr 78 min
100 scrn hr 79 /mn
100 week1 sbp 70 mmhg
100 week7 dbp 110 mm
100 week21 weight 80 pounds
100 fwp height 75 kg
;
Now we can sort the data into an order than can be used to find the repetitious values.
proc sort data=vs ;
by patient vstest row ;
run;
Then we can use a double DOW loop to count how many repetitions, set your duplicate FLAG, and then re-read the data so we can output the FLAG on every observation in the group.
data want;
do reps=1 by 1 until(last.vsorresu);
set vs;
by patient vstest vsren vsorresu notsorted;
end;
dup_flag = (reps>= 4);
do _n_=1 to reps;
set vs;
output;
end;
run;
None of the observations in your data meet the 4 repeated values criteria. The most reps is only 2. So if we change the criteria to look for 2 or more reps:
dup_flag = (reps>= 2);
We get this result:
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.