I'm working with a dataset of lab values, and if an ID has at least 2 lab values within 7 days of each other, I would like to create a row of those values and associated dates. That way, I can look at the rows that have non-missing value2, value3, etc. Please see a simulated dataset below and the dataset I would like to create.
Table 1. Have
obs | id | date | value |
1 | 1 | 5/30/19 | 20 |
2 | 1 | 5/31/19 | 20 |
3 | 1 | 6/1/19 | 19 |
4 | 1 | 6/10/19 | 177 |
5 | 1 | 7/7/19 | 350 |
6 | 1 | 7/10/19 | 200 |
7 | 2 | 1/3/19 | 100 |
8 | 3 | 1/4/19 | 99 |
9 | 3 | 3/6/19 | 19 |
10 | 3 | 3/8/19 | 20 |
11 | 3 | 12/8/19 | 1500 |
Table 2. Want
obs | id | date1 | date2 | date3 | value1 | value2 | value3 |
1 | 1 | 5/30/19 | 5/31/19 | 6/1/19 | 20 | 20 | 19 |
2 | 1 | 6/10/19 | 177 | ||||
3 | 1 | 7/7/19 | 7/10/19 | 350 | 200 | ||
4 | 2 | 1/3/19 | 100 | ||||
5 | 3 | 1/4/19 | 99 | ||||
6 | 3 | 3/6/19 | 3/8/19 | 19 | 20 | ||
6 | 3 | 12/8/19 | 1500 |
Hi @andyec The idea is basically to group the intervals and transpose the contents of the group to wide structure.
data have;
infile datalines ;
input obs id date:MMDDYY10. value;
format date MMDDYY10.;
drop obs;
datalines;
1 1 5/30/19 20
2 1 5/31/19 20
3 1 6/1/19 19
4 1 6/10/19 177
5 1 7/7/19 350
6 1 7/10/19 200
7 2 1/3/19 100
8 3 1/4/19 99
9 3 3/6/19 19
10 3 3/8/19 20
11 3 12/8/19 1500
;
run;
data temp;
set have;
by id;
if dif(date)>7 then grp+1;
if first.id then grp=1;
run;
proc sql noprint;
select max(m) into :m trimmed
from (select max(grp) as m from temp group by id);
quit;
%put &=m;
data want;
do _n_= 1 by 1 until(last.grp);
set temp;
by id grp;
array dt date1-date&m;
array v value1-value&m;
dt(_n_)=date;
v(_n_)=value;
end;
format date: mmddyy10.;
drop date value grp;
run;
Hi @andyec
You can try this:
data have;
infile datalines dlm="09"x;
input obs id date:MMDDYY10. value;
format date MMDDYY10.;
datalines;
1 1 5/30/19 20
2 1 5/31/19 20
3 1 6/1/19 19
4 1 6/10/19 177
5 1 7/7/19 350
6 1 7/10/19 200
7 2 1/3/19 100
8 3 1/4/19 99
9 3 3/6/19 19
10 3 3/8/19 20
11 3 12/8/19 1500
;
run;
data have_flag;
do i=1 until(last.id);
set have;
format date7 MMDDYY10.;
by id;
retain date7;
if first.id then date7 = date;
if date-date7 > 7 then date7 = date;
end;
run;
proc transpose data=have_flag out=have_tr1 prefix=date;
var date;
by id date7;
run;
proc transpose data=have_flag out=have_tr2 prefix=value;
var value;
by id date7;
run;
data want;
merge have_tr1 have_tr2;
by id date7;
drop _name_ date7;
run;
Best,
Hi @andyec The idea is basically to group the intervals and transpose the contents of the group to wide structure.
data have;
infile datalines ;
input obs id date:MMDDYY10. value;
format date MMDDYY10.;
drop obs;
datalines;
1 1 5/30/19 20
2 1 5/31/19 20
3 1 6/1/19 19
4 1 6/10/19 177
5 1 7/7/19 350
6 1 7/10/19 200
7 2 1/3/19 100
8 3 1/4/19 99
9 3 3/6/19 19
10 3 3/8/19 20
11 3 12/8/19 1500
;
run;
data temp;
set have;
by id;
if dif(date)>7 then grp+1;
if first.id then grp=1;
run;
proc sql noprint;
select max(m) into :m trimmed
from (select max(grp) as m from temp group by id);
quit;
%put &=m;
data want;
do _n_= 1 by 1 until(last.grp);
set temp;
by id grp;
array dt date1-date&m;
array v value1-value&m;
dt(_n_)=date;
v(_n_)=value;
end;
format date: mmddyy10.;
drop date value grp;
run;
@ed_sas_member and @novinosrin thank you very much for your replies. I was able to get both to work with the simulated data provided.
@novinosrin when I use your code on the real dataset, I get an excessive number of value and date variables, and it seems to be the maximum amount of times an ID appears (not the maximum amount of times an ID appears to have a certain amount of values within 7 days). Any idea why this might be?
Lastly, I was wondering how I would add in one more additional variable that corresponds to every value. The variable is psource and can be seen in this post Count number of times within a timeframe by id, multiple occurrences by id possible .
Hi @andyec
Would the additional variable psource be splitted in psource1, psource2, psource3 as well?
Hi @andyec
I have added a proc transpose step at the end of the program, just before the merge:
data have_flag;
do i=1 until(last.id);
set have;
format date7 MMDDYY10.;
by id;
retain date7;
if first.id then date7 = date;
if date-date7 > 7 then date7 = date;
end;
run;
proc transpose data=have_flag out=have_tr1 prefix=date;
var date;
by id date7;
run;
proc transpose data=have_flag out=have_tr2 prefix=value;
var value;
by id date7;
run;
proc transpose data=have_flag out=have_tr3 prefix=psource;
var psource;
by id date7;
run;
data want;
merge have_tr1 have_tr2 have_tr3;
by id date7;
drop _name_ date7;
run;
Best,
Hi @andyec My apologies, I overlooked something and that's very silly of me. First try this change and test in full and let me know if this works, and then I shall delve into your additional requirement. Sorry again
proc sql noprint;
select max(m) into :m trimmed
from (select count(grp) as m from temp group by id,grp);
quit;
%put &=m;
Okay good @andyec Are you able to modify the code to get the additional variables too. Methinks, another array and assignment statement like the following should do
array p psource1-psource&m;
p(_n_)=psource;
should do. 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.