I have a dataset like below:
StudyID | Sereis | Hb | CO2 |
S001 | 1 | 21 | 49 |
S001 | 2 | 22 | 51 |
S001 | 3 | 23 | 53 |
S001 | 4 | 24 | 55 |
S001 | 5 | 25 | 57 |
S001 | 6 | 26 | 59 |
S001 | 7 | 27 | 61 |
S001 | 8 | 28 | 63 |
S001 | 9 | 29 | 65 |
S001 | 10 | 30 | 67 |
S002 | 1 | 18 | 45 |
S002 | 2 | 20 | 47 |
S002 | 3 | 22 | 49 |
S002 | 4 | 24 | 51 |
S002 | 5 | 26 | 53 |
S002 | 6 | 28 | 55 |
I'd like to create a dataset like follow. cutting data from series 1- 4 as 1st sub-dataset, then cutting from series from 3-6 as 2nd sub-dataset, and so on. would you please help?
StudyID | Sereis | Hb | CO2 | Subset |
S001 | 1 | 21 | 49 | 1 |
S001 | 2 | 22 | 51 | 1 |
S001 | 3 | 23 | 53 | 1 |
S001 | 4 | 24 | 55 | 1 |
S001 | 3 | 23 | 53 | 2 |
S001 | 4 | 24 | 55 | 2 |
S001 | 5 | 25 | 57 | 2 |
S001 | 6 | 26 | 59 | 2 |
S001 | 5 | 27 | 61 | 3 |
S001 | 6 | 28 | 63 | 3 |
S001 | 7 | 29 | 65 | 3 |
S001 | 8 | 30 | 67 | 3 |
S001 | 7 | 31 | 69 | 4 |
S001 | 8 | 32 | 71 | 4 |
S001 | 9 | 33 | 73 | 4 |
S001 | 10 | 34 | 75 | 4 |
S002 | 1 | 18 | 45 | 1 |
S002 | 2 | 20 | 47 | 1 |
S002 | 3 | 22 | 49 | 1 |
S002 | 4 | 24 | 51 | 1 |
S002 | 3 | 26 | 53 | 2 |
S002 | 4 | 28 | 55 | 2 |
S002 | 5 | 30 | 57 | 2 |
S002 | 6 | 32 | 59 | 2 |
Oh. I just realize the raw data is different with output . Try this one : data have; infile cards expandtabs truncover; input StudyID $ Sereis Hb CO2; cards; S001 1 21 49 S001 2 22 51 S001 3 23 53 S001 4 24 55 S001 5 25 57 S001 6 26 59 S001 7 27 61 S001 8 28 63 S001 9 29 65 S001 10 30 67 S002 1 18 45 S002 2 20 47 S002 3 22 49 S002 4 24 51 S002 5 26 53 S002 6 28 55 ; run; data temp; merge have have(keep=StudyID rename=(StudyID=_StudyID) firstobs=3); if mod(_n_,2)=1 or StudyID ne lag(StudyID) then group+1; idx=1;output; if StudyID=lag2(StudyID) and StudyID=_StudyID then do;idx=2;output;end; drop _StudyID; run; proc sort data=temp; by StudyID group idx Sereis; run; data temp; set temp; by StudyID; if first.StudyID then do;k=0;g=0;end; k+1; if mod(k,4)=1 then g+1; drop k group idx; run; proc sql noprint; select distinct cats('want',g) into : data separated by ' ' from temp; select distinct cats('when(',g,') output want',g,';') into : select separated by ' ' from temp; run; data &data ; set temp; select(g); &select otherwise; end; drop g; run;
You will have to explain all of the rules involved in going from 10 records within an id and selectively dupicating some of them such as why were series 3 through 8 duplicated but 1, 2, 9 and 10 not?
Are there always exactly 10 initial values of series for an Id? If fewer or more what would the duplication pattern be?
Is the exact output order critical?
sure. this is a sample data from our study. we measured each patient's Hb and CO2 over time. like patient-1 (studyID=S001) had 10 time point (Sereis 1-10) measurements, but patient-2 (S002) only had 6 time point (Sereis 1-6) measurements. we want to cut the data for a speciafic time windows (here every 4 time points), but the next cut window should be re-set from half of previous window (i.e. 1st sub-set: from sereis 1 to 4; 2nd sub-set: from sereis 3 to 6 and so on). so there is no dulplicated at both begining and ending time point when we move the window forward.
thanks for help!
In general you can do this as follows, this creates two datasets, one for females and one for males.
If you have many categories, or need some sort of dynamic set up it's more complex and you'll need to provide details for a better sample. This should help you get started.
data female male;
set sashelp.class;
if sex='F' then output female;
else if sex='M' then output male;
run;
thanks. good to know.
Does every StudyID group have the number which is multiply of 4 -- n*4 ? If that was , that would be easy. data temp; set have; by StudyID; if first.StudyID then group=0; if mod(_n_,4)=1 then group+1; run; after that use HashTable or SELECT to split the table.
Thanks Xia. Yes. I'd like to cut data by using same time window (here is 4 time points) for each studyID. but I need to re-set next window start point from half of the previous window. i.e. 1st group: from sereis 1-4; 2nd group: from sereis 3-6 (not from sereis 5-8).
Oh. I just realize the raw data is different with output . Try this one : data have; infile cards expandtabs truncover; input StudyID $ Sereis Hb CO2; cards; S001 1 21 49 S001 2 22 51 S001 3 23 53 S001 4 24 55 S001 5 25 57 S001 6 26 59 S001 7 27 61 S001 8 28 63 S001 9 29 65 S001 10 30 67 S002 1 18 45 S002 2 20 47 S002 3 22 49 S002 4 24 51 S002 5 26 53 S002 6 28 55 ; run; data temp; merge have have(keep=StudyID rename=(StudyID=_StudyID) firstobs=3); if mod(_n_,2)=1 or StudyID ne lag(StudyID) then group+1; idx=1;output; if StudyID=lag2(StudyID) and StudyID=_StudyID then do;idx=2;output;end; drop _StudyID; run; proc sort data=temp; by StudyID group idx Sereis; run; data temp; set temp; by StudyID; if first.StudyID then do;k=0;g=0;end; k+1; if mod(k,4)=1 then g+1; drop k group idx; run; proc sql noprint; select distinct cats('want',g) into : data separated by ' ' from temp; select distinct cats('when(',g,') output want',g,';') into : select separated by ' ' from temp; run; data &data ; set temp; select(g); &select otherwise; end; drop g; run;
Thank you so much! this works very well in our project.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.