DOSE=1 at TIME=0 represents the first dosing event. Subsequent dosing event is represented by DOSE=1. I want to create a variable “SERIAL” which identify serial samples from non-serial samples. Non serial samples will be coded as SAMPLE=1 and Serial samples: Number groups of serial samples sequentially with consecutive integers starting from 2 (i.e., SERIAL= 2, 3, ..). Serial samples: a group of >=3 consecutive samples (rows) associated with the same dose (pre-dose and all post-dose samples). Samples (or rows) with the same “VISIT” value belong to the same serial sample group if they are associated with the same dose (including the pre-dose sample) or were taken within <=192 h (TIME) of the first dose in the serial sampling group. SERIAL= . if DOSE=1. In this calculations rows where C1 != . & C2 = 1 & C3 equal to 1 or 3 should be ignored. I am struggling to implement this in SAS code. Any help is greatly appreciated. Thanks
ID | C1 | C2 | C3 | TIME | DOSE | VISIT | SERIAL |
2 | . | 1 | 1 | 0 | 1 | 1 | . |
2 | . | 0 | 2 | 2 | 0 | 1 | 1 |
2 | . | 0 | 4 | 2 | 0 | 1 | 1 |
2 | . | 1 | 1 | 96 | 1 | 1 | . |
2 | . | 0 | 2 | 100 | 0 | 1 | 2 |
2 | . | 0 | 4 | 100 | 0 | 1 | 2 |
2 | . | 0 | 2 | 232 | 0 | 1 | 2 |
2 | . | 0 | 4 | 232 | 0 | 1 | 2 |
2 | . | 0 | 2 | 258 | 0 | 2 | 2 |
2 | . | 0 | 4 | 258 | 0 | 2 | 2 |
2 | . | 0 | 2 | 267 | 0 | 2 | 2 |
2 | . | 0 | 4 | 267 | 0 | 2 | 2 |
2 | 4 | 1 | 2 | 604 | 0 | 3 | 1 |
2 | 4 | 1 | 4 | 604 | 0 | 3 | 1 |
2 | 4 | 1 | 2 | 963 | 0 | 4 | 1 |
2 | 4 | 1 | 4 | 963 | 0 | 4 | 1 |
2 | 4 | 1 | 2 | 1637 | 0 | 5 | 1 |
2 | 4 | 1 | 4 | 1637 | 0 | 5 | 1 |
1 | 4 | 1 | 2 | 0 | 0 | 1 | 2 |
1 | 4 | 1 | 4 | 0 | 0 | 1 | 2 |
1 | . | 1 | 1 | 0 | 1 | 1 | . |
1 | . | 0 | 2 | 1 | 0 | 1 | 2 |
1 | 4 | 1 | 4 | 1 | 0 | 1 | 2 |
1 | . | 0 | 2 | 4 | 0 | 1 | 2 |
1 | . | 0 | 4 | 4 | 0 | 1 | 2 |
1 | . | 0 | 2 | 8 | 0 | 1 | 2 |
1 | . | 0 | 4 | 8 | 0 | 1 | 2 |
1 | . | 0 | 2 | 12 | 0 | 1 | 2 |
1 | . | 0 | 4 | 12 | 0 | 1 | 2 |
1 | . | 0 | 2 | 24 | 0 | 1 | 2 |
1 | . | 0 | 4 | 24 | 0 | 1 | 2 |
1 | . | 0 | 2 | 48 | 0 | 1 | 2 |
1 | . | 0 | 4 | 48 | 0 | 1 | 2 |
1 | . | 1 | 1 | 72 | 1 | 1 | 2 |
1 | . | 0 | 2 | 72 | 0 | 1 | 2 |
1 | . | 0 | 4 | 72 | 0 | 1 | 2 |
1 | . | 0 | 2 | 167 | 0 | 1 | 2 |
1 | . | 0 | 4 | 167 | 0 | 1 | 2 |
1 | . | 1 | 1 | 169 | 1 | 1 | . |
1 | . | 1 | 1 | 240 | 1 | 1 | . |
1 | . | 0 | 2 | 335 | 0 | 2 | 1 |
1 | . | 0 | 4 | 335 | 0 | 2 | 1 |
1 | . | 1 | 1 | 337 | 1 | 2 | . |
1 | . | 1 | 1 | 434 | 1 | 2 | . |
1 | . | 0 | 2 | 503 | 0 | 3 | 1 |
1 | . | 0 | 4 | 503 | 0 | 3 | 1 |
1 | . | 1 | 1 | 503 | 1 | 3 | . |
1 | 4 | 1 | 2 | 669 | 0 | 4 | 1 |
1 | . | 0 | 4 | 669 | 0 | 4 | 1 |
1 | . | 1 | 1 | 773 | 1 | 4 | . |
1 | . | 1 | 1 | 888 | 1 | 4 | . |
1 | . | 1 | 1 | 940 | 1 | 4 | . |
1 | . | 0 | 2 | 1006 | 0 | 5 | 1 |
1 | . | 0 | 4 | 1006 | 0 | 5 | 1 |
1 | . | 1 | 1 | 1037 | 1 | 5 | . |
1 | . | 1 | 1 | 1108 | 1 | 5 | . |
1 | . | 1 | 1 | 1206 | 1 | 5 | . |
1 | . | 1 | 1 | 1282 | 1 | 5 | . |
1 | . | 0 | 2 | 1342 | 0 | 6 | 1 |
1 | . | 0 | 4 | 1342 | 0 | 6 | 1 |
1 | . | 1 | 1 | 1373 | 1 | 6 | . |
1 | . | 1 | 1 | 1441 | 1 | 6 | . |
1 | . | 1 | 1 | 1513 | 1 | 6 | . |
1 | 4 | 1 | 2 | 1679 | 0 | 7 | 1 |
1 | 4 | 1 | 4 | 1679 | 0 | 7 | 1 |
1 | . | 1 | 1 | 1680 | 1 | 7 | . |
1 | . | 1 | 1 | 1752 | 1 | 7 | . |
1 | . | 1 | 1 | 1850 | 1 | 7 | . |
1 | . | 1 | 1 | 1921 | 1 | 7 | . |
1 | 4 | 1 | 2 | 2014 | 0 | 8 | 1 |
1 | . | 0 | 4 | 2014 | 0 | 8 | 1 |
1 | . | 1 | 1 | 2018 | 1 | 8 | . |
1 | . | 1 | 1 | 2089 | 1 | 8 | . |
1 | . | 1 | 1 | 2185 | 1 | 8 | . |
1 | . | 1 | 1 | 2477 | 1 | 8 | . |
1 | . | 1 | 1 | 2547 | 1 | 8 | . |
1 | . | 1 | 1 | 2643 | 1 | 8 | . |
1 | . | 0 | 2 | 2686 | 0 | 8 | 1 |
1 | . | 0 | 4 | 2686 | 0 | 8 | 1 |
1 | . | 1 | 1 | 2715 | 1 | 8 | . |
1 | . | 1 | 1 | 2811 | 1 | 8 | . |
1 | . | 1 | 1 | 2906 | 1 | 8 | . |
1 | . | 1 | 1 | 2951 | 1 | 8 | . |
1 | . | 1 | 1 | 3047 | 1 | 8 | . |
1 | . | 1 | 1 | 3118 | 1 | 8 | . |
1 | . | 1 | 1 | 3215 | 1 | 8 | . |
1 | . | 1 | 1 | 3286 | 1 | 8 | . |
1 | 4 | 1 | 2 | 3360 | 0 | 8 | 1 |
1 | . | 0 | 4 | 3360 | 0 | 8 | 1 |
1 | . | 1 | 1 | 3382 | 1 | 8 | . |
1 | . | 1 | 1 | 5574 | 1 | 8 | . |
1 | . | 1 | 1 | 5639 | 1 | 8 | . |
1 | . | 1 | 1 | 5734 | 1 | 8 | . |
1 | . | 1 | 1 | 5808 | 1 | 8 | . |
1 | . | 1 | 1 | 5902 | 1 | 8 | . |
1 | . | 1 | 1 | 5977 | 1 | 8 | . |
1 | . | 1 | 1 | 6070 | 1 | 8 | . |
1 | . | 1 | 1 | 6143 | 1 | 8 | . |
1 | . | 1 | 1 | 6239 | 1 | 8 | . |
1 | . | 1 | 1 | 6311 | 1 | 8 | . |
1 | . | 1 | 1 | 6413 | 1 | 8 | . |
1 | . | 1 | 1 | 6485 | 1 | 8 | . |
In your post you mentioned:
1) I want to create a variable “SERIAL”
2) SERIAL= . if DOSE=1.
I couldn't understand your other rules to calculate SERIAL.
As your posted data has already a colomn named SERIAL,
I tried to build a code that create a new variable named calc_sn,
that as much as I understand should be equal to the posted variable SERIAL.
If you check next code log, you will see that first 18 rows has calc_sn = serial.
Maybe you can complete the program by yoursef, if not -
what are the rules to add 1 to the calc_sn or to reset it to 1.
Does ID have any part in those rules ?
data have;
input
ID C1 C2 C3 TIME DOSE VISIT SERIAL;
datalines;
2 . 1 1 0 1 1 .
2 . 0 2 2 0 1 1
2 . 0 4 2 0 1 1
2 . 1 1 96 1 1 .
2 . 0 2 100 0 1 2
2 . 0 4 100 0 1 2
2 . 0 2 232 0 1 2
2 . 0 4 232 0 1 2
2 . 0 2 258 0 2 2
2 . 0 4 258 0 2 2
2 . 0 2 267 0 2 2
2 . 0 4 267 0 2 2
2 4 1 2 604 0 3 1
2 4 1 4 604 0 3 1
2 4 1 2 963 0 4 1
2 4 1 4 963 0 4 1
2 4 1 2 1637 0 5 1
2 4 1 4 1637 0 5 1
1 4 1 2 0 0 1 2
1 4 1 4 0 0 1 2
1 . 1 1 0 1 1 .
1 . 0 2 1 0 1 2
1 4 1 4 1 0 1 2
1 . 0 2 4 0 1 2
1 . 0 4 4 0 1 2
1 . 0 2 8 0 1 2
1 . 0 4 8 0 1 2
1 . 0 2 12 0 1 2
1 . 0 4 12 0 1 2
1 . 0 2 24 0 1 2
1 . 0 4 24 0 1 2
1 . 0 2 48 0 1 2
1 . 0 4 48 0 1 2
1 . 1 1 72 1 1 2
1 . 0 2 72 0 1 2
1 . 0 4 72 0 1 2
1 . 0 2 167 0 1 2
1 . 0 4 167 0 1 2
1 . 1 1 169 1 1 .
1 . 1 1 240 1 1 .
1 . 0 2 335 0 2 1
1 . 0 4 335 0 2 1
1 . 1 1 337 1 2 .
1 . 1 1 434 1 2 .
1 . 0 2 503 0 3 1
1 . 0 4 503 0 3 1
1 . 1 1 503 1 3 .
1 4 1 2 669 0 4 1
1 . 0 4 669 0 4 1
1 . 1 1 773 1 4 .
1 . 1 1 888 1 4 .
1 . 1 1 940 1 4 .
1 . 0 2 1006 0 5 1
1 . 0 4 1006 0 5 1
1 . 1 1 1037 1 5 .
1 . 1 1 1108 1 5 .
1 . 1 1 1206 1 5 .
1 . 1 1 1282 1 5 .
1 . 0 2 1342 0 6 1
1 . 0 4 1342 0 6 1
1 . 1 1 1373 1 6 .
1 . 1 1 1441 1 6 .
1 . 1 1 1513 1 6 .
1 4 1 2 1679 0 7 1
1 4 1 4 1679 0 7 1
1 . 1 1 1680 1 7 .
1 . 1 1 1752 1 7 .
1 . 1 1 1850 1 7 .
1 . 1 1 1921 1 7 .
1 4 1 2 2014 0 8 1
1 . 0 4 2014 0 8 1
1 . 1 1 2018 1 8 .
1 . 1 1 2089 1 8 .
1 . 1 1 2185 1 8 .
1 . 1 1 2477 1 8 .
1 . 1 1 2547 1 8 .
1 . 1 1 2643 1 8 .
1 . 0 2 2686 0 8 1
1 . 0 4 2686 0 8 1
1 . 1 1 2715 1 8 .
1 . 1 1 2811 1 8 .
1 . 1 1 2906 1 8 .
1 . 1 1 2951 1 8 .
1 . 1 1 3047 1 8 .
1 . 1 1 3118 1 8 .
1 . 1 1 3215 1 8 .
1 . 1 1 3286 1 8 .
1 4 1 2 3360 0 8 1
1 . 0 4 3360 0 8 1
1 . 1 1 3382 1 8 .
1 . 1 1 5574 1 8 .
1 . 1 1 5639 1 8 .
1 . 1 1 5734 1 8 .
1 . 1 1 5808 1 8 .
1 . 1 1 5902 1 8 .
1 . 1 1 5977 1 8 .
1 . 1 1 6070 1 8 .
1 . 1 1 6143 1 8 .
1 . 1 1 6239 1 8 .
1 . 1 1 6311 1 8 .
1 . 1 1 6413 1 8 .
1 . 1 1 6485 1 8 .
; run;
data want;
set have;
by notsorted id;
retain sn 0
calc_sn; /* calculated serial number */
if first.id then sn=0;
if dose=1 then do;
calc_sn=.; output;
sn+1;
return;
end;
else calc_sn = sn;
if c1 ne . and c2=1 and c3 not in (1, 3) then do;
calc_sn=1; sn=1;
end;
output;
/* check */
if calc_sn ne serial then put _N_= serial= calc_sn=;
drop sn;
run;
@Shmuel: Thank you.
TIME, DOSE are important here. Several doses (>1) are given per subject and each dose is given in time. For instance, ID=2, received doses at time=0 and time=96. dose=1 and serial=. represent those dosing events. Serial samples are >= 3 consecutive samples (DOSE=0) associated with same dose (including predose samples). The samples at TIME=2 were set to SERIAL=1 as they are non-consecutive samples (<=3 consecutive samples), because at TIME=96 next dose is given. Following dose at TIME=96, there are serial samples (>=3) or <= 192 hr from the first dose (from TIME=100 to TIME=267), and hence these serial samples are coded as 2. Samples from TIME=604 to TIME=1637 are > 192 hr from first dose and hence were set to SERIAL=1.
For ID=1, First dose is given at TIME=0, second dose at TIME=72 & third dose at TIME=169. Samples between TIME=1 and TIME=169 are >= 3 consecutive samples or within <=192 hr from the first dose and hence are coded SERIAL=2. DOSES given at later TIME points such as 169, 240, 337 etc.. do not have >=3 samples subsequent to them and hence are coded as SERIAL=1. In case if >=3 consecutive samples are present after TIME=169, those samples will be coded as SERIAL=3 and so on.
Hope this information helps. Please let me know if you have any questions.
Thank you.
@ari, I have been too busy yesterday and today, so I want just to clarify some more points:
- Is ID the subject ID ?
- Do you mean by "sample" - a row/observation in the dataset ?
- If I want to sort the dataset by subject, what other variables I need
beyond ID (as subject ) and time - as there are "dupliacate" lines, regarding ID Time only.
- AmI right, to count consecutive samples I need count rows with DOSE=0 between
instances of DOSE=1.
- Would you say: if there are less than 3 consecutive samples then its Serial=1
otherwise Serial count from 2 on by 1 (2, 3, 4...) per subject (ID) ?
I hope I'll have time to continue tomorrow.
@Shmuel. Thanks for the response.
Yes, Id is the subject ID and sample refers to row/observation.
Sort order is: ID TIME DOSE C3 can be used to sort the records
Yes, consecutive samples can be identified by counting rows with DOSE=0 between instances of DOSE=1
if there are less than 3 consecutive samples then its Serial=1 otherwise Serial count from 2 on by 1 (2, 3, 4...) per subject (ID) ?
Yes, for instance, if there are 3 groups of 4 consecutive samples (or rows) are found per subject. 1st group wil be SERIAL=2, 2nd group will be SERIAL=3 and so on.
Thank you.
Please run next code. No changes to step 1 - creating HAVE dataset.
There are still few observations with serial NE cal_sn (calculated serial number);
Check dataset CHECK.
proc sort data=have out=haves; by id time dose c3; run;
data count_cs(rename=(k1=id k2=time k3=dose k4=c3));
set haves end=eof;
by id time dose c3;
retain phase 0 /* 0 1st time only */
k1 k2 k3 k4
count_cs; /* count consecutive samples */
if phase=1 and dose=1
or last.id
or eof then output;
if dose=1 then do;
phase=1;
k1=id; k2=time; k3=dose; k4=c3;
count_cs=0;
end;
if dose=0 then count_cs+1;
keep k1 k2 k3 k4 count_cs;
run;
data want;
merge haves
count_cs(rename=(count_cs=count));
by id time dose c3;
retain count_cs sn dose_time
calc_sn; /* sn = serial number */
lag_time = lag(time);
if first.id then sn=0;
if dose=1 then do;
dose_time=time;
count_cs=count;
calc_sn=.; output;
if count_cs < 3
then calc_sn=1;
else do;
if sn=0 then sn=2; else sn+1;
calc_sn=sn;
end;
end;
else do;
if time - dose_time > 192 then calc_sn=1;
output;
end;
drop sn count;
run;
data check; /* should contain 0 obs */
set want;
if serial ne calc_sn;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.