DOSE=1 at TIME=0 represents the first dosing event. Subsequent dosing event is represented by DOSE=1. I want to create a variable “SERIAL” which identify serial samples from non-serial samples. Non serial samples will be coded as SAMPLE=1 and Serial samples: Number groups of serial samples sequentially with consecutive integers starting from 2 (i.e., SERIAL= 2, 3, ..). Serial samples: a group of >=3 consecutive samples (rows) associated with the same dose (pre-dose and all post-dose samples). Samples (or rows) with the same “VISIT” value belong to the same serial sample group if they are associated with the same dose (including the pre-dose sample) or were taken within <=192 h (TIME) of the first dose in the serial sampling group. SERIAL= . if DOSE=1. In this calculations rows where C1 != . & C2 = 1 & C3 equal to 1 or 3 should be ignored. I am struggling to implement this in SAS code. Any help is greatly appreciated. Thanks
ID | C1 | C2 | C3 | TIME | DOSE | VISIT | SERIAL |
2 | . | 1 | 1 | 0 | 1 | 1 | . |
2 | . | 0 | 2 | 2 | 0 | 1 | 1 |
2 | . | 0 | 4 | 2 | 0 | 1 | 1 |
2 | . | 1 | 1 | 96 | 1 | 1 | . |
2 | . | 0 | 2 | 100 | 0 | 1 | 2 |
2 | . | 0 | 4 | 100 | 0 | 1 | 2 |
2 | . | 0 | 2 | 232 | 0 | 1 | 2 |
2 | . | 0 | 4 | 232 | 0 | 1 | 2 |
2 | . | 0 | 2 | 258 | 0 | 2 | 2 |
2 | . | 0 | 4 | 258 | 0 | 2 | 2 |
2 | . | 0 | 2 | 267 | 0 | 2 | 2 |
2 | . | 0 | 4 | 267 | 0 | 2 | 2 |
2 | 4 | 1 | 2 | 604 | 0 | 3 | 1 |
2 | 4 | 1 | 4 | 604 | 0 | 3 | 1 |
2 | 4 | 1 | 2 | 963 | 0 | 4 | 1 |
2 | 4 | 1 | 4 | 963 | 0 | 4 | 1 |
2 | 4 | 1 | 2 | 1637 | 0 | 5 | 1 |
2 | 4 | 1 | 4 | 1637 | 0 | 5 | 1 |
1 | 4 | 1 | 2 | 0 | 0 | 1 | 2 |
1 | 4 | 1 | 4 | 0 | 0 | 1 | 2 |
1 | . | 1 | 1 | 0 | 1 | 1 | . |
1 | . | 0 | 2 | 1 | 0 | 1 | 2 |
1 | 4 | 1 | 4 | 1 | 0 | 1 | 2 |
1 | . | 0 | 2 | 4 | 0 | 1 | 2 |
1 | . | 0 | 4 | 4 | 0 | 1 | 2 |
1 | . | 0 | 2 | 8 | 0 | 1 | 2 |
1 | . | 0 | 4 | 8 | 0 | 1 | 2 |
1 | . | 0 | 2 | 12 | 0 | 1 | 2 |
1 | . | 0 | 4 | 12 | 0 | 1 | 2 |
1 | . | 0 | 2 | 24 | 0 | 1 | 2 |
1 | . | 0 | 4 | 24 | 0 | 1 | 2 |
1 | . | 0 | 2 | 48 | 0 | 1 | 2 |
1 | . | 0 | 4 | 48 | 0 | 1 | 2 |
1 | . | 1 | 1 | 72 | 1 | 1 | 2 |
1 | . | 0 | 2 | 72 | 0 | 1 | 2 |
1 | . | 0 | 4 | 72 | 0 | 1 | 2 |
1 | . | 0 | 2 | 167 | 0 | 1 | 2 |
1 | . | 0 | 4 | 167 | 0 | 1 | 2 |
1 | . | 1 | 1 | 169 | 1 | 1 | . |
1 | . | 1 | 1 | 240 | 1 | 1 | . |
1 | . | 0 | 2 | 335 | 0 | 2 | 1 |
1 | . | 0 | 4 | 335 | 0 | 2 | 1 |
1 | . | 1 | 1 | 337 | 1 | 2 | . |
1 | . | 1 | 1 | 434 | 1 | 2 | . |
1 | . | 0 | 2 | 503 | 0 | 3 | 1 |
1 | . | 0 | 4 | 503 | 0 | 3 | 1 |
1 | . | 1 | 1 | 503 | 1 | 3 | . |
1 | 4 | 1 | 2 | 669 | 0 | 4 | 1 |
1 | . | 0 | 4 | 669 | 0 | 4 | 1 |
1 | . | 1 | 1 | 773 | 1 | 4 | . |
1 | . | 1 | 1 | 888 | 1 | 4 | . |
1 | . | 1 | 1 | 940 | 1 | 4 | . |
1 | . | 0 | 2 | 1006 | 0 | 5 | 1 |
1 | . | 0 | 4 | 1006 | 0 | 5 | 1 |
1 | . | 1 | 1 | 1037 | 1 | 5 | . |
1 | . | 1 | 1 | 1108 | 1 | 5 | . |
1 | . | 1 | 1 | 1206 | 1 | 5 | . |
1 | . | 1 | 1 | 1282 | 1 | 5 | . |
1 | . | 0 | 2 | 1342 | 0 | 6 | 1 |
1 | . | 0 | 4 | 1342 | 0 | 6 | 1 |
1 | . | 1 | 1 | 1373 | 1 | 6 | . |
1 | . | 1 | 1 | 1441 | 1 | 6 | . |
1 | . | 1 | 1 | 1513 | 1 | 6 | . |
1 | 4 | 1 | 2 | 1679 | 0 | 7 | 1 |
1 | 4 | 1 | 4 | 1679 | 0 | 7 | 1 |
1 | . | 1 | 1 | 1680 | 1 | 7 | . |
1 | . | 1 | 1 | 1752 | 1 | 7 | . |
1 | . | 1 | 1 | 1850 | 1 | 7 | . |
1 | . | 1 | 1 | 1921 | 1 | 7 | . |
1 | 4 | 1 | 2 | 2014 | 0 | 8 | 1 |
1 | . | 0 | 4 | 2014 | 0 | 8 | 1 |
1 | . | 1 | 1 | 2018 | 1 | 8 | . |
1 | . | 1 | 1 | 2089 | 1 | 8 | . |
1 | . | 1 | 1 | 2185 | 1 | 8 | . |
1 | . | 1 | 1 | 2477 | 1 | 8 | . |
1 | . | 1 | 1 | 2547 | 1 | 8 | . |
1 | . | 1 | 1 | 2643 | 1 | 8 | . |
1 | . | 0 | 2 | 2686 | 0 | 8 | 1 |
1 | . | 0 | 4 | 2686 | 0 | 8 | 1 |
1 | . | 1 | 1 | 2715 | 1 | 8 | . |
1 | . | 1 | 1 | 2811 | 1 | 8 | . |
1 | . | 1 | 1 | 2906 | 1 | 8 | . |
1 | . | 1 | 1 | 2951 | 1 | 8 | . |
1 | . | 1 | 1 | 3047 | 1 | 8 | . |
1 | . | 1 | 1 | 3118 | 1 | 8 | . |
1 | . | 1 | 1 | 3215 | 1 | 8 | . |
1 | . | 1 | 1 | 3286 | 1 | 8 | . |
1 | 4 | 1 | 2 | 3360 | 0 | 8 | 1 |
1 | . | 0 | 4 | 3360 | 0 | 8 | 1 |
1 | . | 1 | 1 | 3382 | 1 | 8 | . |
1 | . | 1 | 1 | 5574 | 1 | 8 | . |
1 | . | 1 | 1 | 5639 | 1 | 8 | . |
1 | . | 1 | 1 | 5734 | 1 | 8 | . |
1 | . | 1 | 1 | 5808 | 1 | 8 | . |
1 | . | 1 | 1 | 5902 | 1 | 8 | . |
1 | . | 1 | 1 | 5977 | 1 | 8 | . |
1 | . | 1 | 1 | 6070 | 1 | 8 | . |
1 | . | 1 | 1 | 6143 | 1 | 8 | . |
1 | . | 1 | 1 | 6239 | 1 | 8 | . |
1 | . | 1 | 1 | 6311 | 1 | 8 | . |
1 | . | 1 | 1 | 6413 | 1 | 8 | . |
1 | . | 1 | 1 | 6485 | 1 | 8 | . |
In your post you mentioned:
1) I want to create a variable “SERIAL”
2) SERIAL= . if DOSE=1.
I couldn't understand your other rules to calculate SERIAL.
As your posted data has already a colomn named SERIAL,
I tried to build a code that create a new variable named calc_sn,
that as much as I understand should be equal to the posted variable SERIAL.
If you check next code log, you will see that first 18 rows has calc_sn = serial.
Maybe you can complete the program by yoursef, if not -
what are the rules to add 1 to the calc_sn or to reset it to 1.
Does ID have any part in those rules ?
data have;
input
ID C1 C2 C3 TIME DOSE VISIT SERIAL;
datalines;
2 . 1 1 0 1 1 .
2 . 0 2 2 0 1 1
2 . 0 4 2 0 1 1
2 . 1 1 96 1 1 .
2 . 0 2 100 0 1 2
2 . 0 4 100 0 1 2
2 . 0 2 232 0 1 2
2 . 0 4 232 0 1 2
2 . 0 2 258 0 2 2
2 . 0 4 258 0 2 2
2 . 0 2 267 0 2 2
2 . 0 4 267 0 2 2
2 4 1 2 604 0 3 1
2 4 1 4 604 0 3 1
2 4 1 2 963 0 4 1
2 4 1 4 963 0 4 1
2 4 1 2 1637 0 5 1
2 4 1 4 1637 0 5 1
1 4 1 2 0 0 1 2
1 4 1 4 0 0 1 2
1 . 1 1 0 1 1 .
1 . 0 2 1 0 1 2
1 4 1 4 1 0 1 2
1 . 0 2 4 0 1 2
1 . 0 4 4 0 1 2
1 . 0 2 8 0 1 2
1 . 0 4 8 0 1 2
1 . 0 2 12 0 1 2
1 . 0 4 12 0 1 2
1 . 0 2 24 0 1 2
1 . 0 4 24 0 1 2
1 . 0 2 48 0 1 2
1 . 0 4 48 0 1 2
1 . 1 1 72 1 1 2
1 . 0 2 72 0 1 2
1 . 0 4 72 0 1 2
1 . 0 2 167 0 1 2
1 . 0 4 167 0 1 2
1 . 1 1 169 1 1 .
1 . 1 1 240 1 1 .
1 . 0 2 335 0 2 1
1 . 0 4 335 0 2 1
1 . 1 1 337 1 2 .
1 . 1 1 434 1 2 .
1 . 0 2 503 0 3 1
1 . 0 4 503 0 3 1
1 . 1 1 503 1 3 .
1 4 1 2 669 0 4 1
1 . 0 4 669 0 4 1
1 . 1 1 773 1 4 .
1 . 1 1 888 1 4 .
1 . 1 1 940 1 4 .
1 . 0 2 1006 0 5 1
1 . 0 4 1006 0 5 1
1 . 1 1 1037 1 5 .
1 . 1 1 1108 1 5 .
1 . 1 1 1206 1 5 .
1 . 1 1 1282 1 5 .
1 . 0 2 1342 0 6 1
1 . 0 4 1342 0 6 1
1 . 1 1 1373 1 6 .
1 . 1 1 1441 1 6 .
1 . 1 1 1513 1 6 .
1 4 1 2 1679 0 7 1
1 4 1 4 1679 0 7 1
1 . 1 1 1680 1 7 .
1 . 1 1 1752 1 7 .
1 . 1 1 1850 1 7 .
1 . 1 1 1921 1 7 .
1 4 1 2 2014 0 8 1
1 . 0 4 2014 0 8 1
1 . 1 1 2018 1 8 .
1 . 1 1 2089 1 8 .
1 . 1 1 2185 1 8 .
1 . 1 1 2477 1 8 .
1 . 1 1 2547 1 8 .
1 . 1 1 2643 1 8 .
1 . 0 2 2686 0 8 1
1 . 0 4 2686 0 8 1
1 . 1 1 2715 1 8 .
1 . 1 1 2811 1 8 .
1 . 1 1 2906 1 8 .
1 . 1 1 2951 1 8 .
1 . 1 1 3047 1 8 .
1 . 1 1 3118 1 8 .
1 . 1 1 3215 1 8 .
1 . 1 1 3286 1 8 .
1 4 1 2 3360 0 8 1
1 . 0 4 3360 0 8 1
1 . 1 1 3382 1 8 .
1 . 1 1 5574 1 8 .
1 . 1 1 5639 1 8 .
1 . 1 1 5734 1 8 .
1 . 1 1 5808 1 8 .
1 . 1 1 5902 1 8 .
1 . 1 1 5977 1 8 .
1 . 1 1 6070 1 8 .
1 . 1 1 6143 1 8 .
1 . 1 1 6239 1 8 .
1 . 1 1 6311 1 8 .
1 . 1 1 6413 1 8 .
1 . 1 1 6485 1 8 .
; run;
data want;
set have;
by notsorted id;
retain sn 0
calc_sn; /* calculated serial number */
if first.id then sn=0;
if dose=1 then do;
calc_sn=.; output;
sn+1;
return;
end;
else calc_sn = sn;
if c1 ne . and c2=1 and c3 not in (1, 3) then do;
calc_sn=1; sn=1;
end;
output;
/* check */
if calc_sn ne serial then put _N_= serial= calc_sn=;
drop sn;
run;
@Shmuel: Thank you.
TIME, DOSE are important here. Several doses (>1) are given per subject and each dose is given in time. For instance, ID=2, received doses at time=0 and time=96. dose=1 and serial=. represent those dosing events. Serial samples are >= 3 consecutive samples (DOSE=0) associated with same dose (including predose samples). The samples at TIME=2 were set to SERIAL=1 as they are non-consecutive samples (<=3 consecutive samples), because at TIME=96 next dose is given. Following dose at TIME=96, there are serial samples (>=3) or <= 192 hr from the first dose (from TIME=100 to TIME=267), and hence these serial samples are coded as 2. Samples from TIME=604 to TIME=1637 are > 192 hr from first dose and hence were set to SERIAL=1.
For ID=1, First dose is given at TIME=0, second dose at TIME=72 & third dose at TIME=169. Samples between TIME=1 and TIME=169 are >= 3 consecutive samples or within <=192 hr from the first dose and hence are coded SERIAL=2. DOSES given at later TIME points such as 169, 240, 337 etc.. do not have >=3 samples subsequent to them and hence are coded as SERIAL=1. In case if >=3 consecutive samples are present after TIME=169, those samples will be coded as SERIAL=3 and so on.
Hope this information helps. Please let me know if you have any questions.
Thank you.
@ari, I have been too busy yesterday and today, so I want just to clarify some more points:
- Is ID the subject ID ?
- Do you mean by "sample" - a row/observation in the dataset ?
- If I want to sort the dataset by subject, what other variables I need
beyond ID (as subject ) and time - as there are "dupliacate" lines, regarding ID Time only.
- AmI right, to count consecutive samples I need count rows with DOSE=0 between
instances of DOSE=1.
- Would you say: if there are less than 3 consecutive samples then its Serial=1
otherwise Serial count from 2 on by 1 (2, 3, 4...) per subject (ID) ?
I hope I'll have time to continue tomorrow.
@Shmuel. Thanks for the response.
Yes, Id is the subject ID and sample refers to row/observation.
Sort order is: ID TIME DOSE C3 can be used to sort the records
Yes, consecutive samples can be identified by counting rows with DOSE=0 between instances of DOSE=1
if there are less than 3 consecutive samples then its Serial=1 otherwise Serial count from 2 on by 1 (2, 3, 4...) per subject (ID) ?
Yes, for instance, if there are 3 groups of 4 consecutive samples (or rows) are found per subject. 1st group wil be SERIAL=2, 2nd group will be SERIAL=3 and so on.
Thank you.
Please run next code. No changes to step 1 - creating HAVE dataset.
There are still few observations with serial NE cal_sn (calculated serial number);
Check dataset CHECK.
proc sort data=have out=haves; by id time dose c3; run;
data count_cs(rename=(k1=id k2=time k3=dose k4=c3));
set haves end=eof;
by id time dose c3;
retain phase 0 /* 0 1st time only */
k1 k2 k3 k4
count_cs; /* count consecutive samples */
if phase=1 and dose=1
or last.id
or eof then output;
if dose=1 then do;
phase=1;
k1=id; k2=time; k3=dose; k4=c3;
count_cs=0;
end;
if dose=0 then count_cs+1;
keep k1 k2 k3 k4 count_cs;
run;
data want;
merge haves
count_cs(rename=(count_cs=count));
by id time dose c3;
retain count_cs sn dose_time
calc_sn; /* sn = serial number */
lag_time = lag(time);
if first.id then sn=0;
if dose=1 then do;
dose_time=time;
count_cs=count;
calc_sn=.; output;
if count_cs < 3
then calc_sn=1;
else do;
if sn=0 then sn=2; else sn+1;
calc_sn=sn;
end;
end;
else do;
if time - dose_time > 192 then calc_sn=1;
output;
end;
drop sn count;
run;
data check; /* should contain 0 obs */
set want;
if serial ne calc_sn;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.