Solved: Re: how to create multiple sub-databases from a time series database?

zglu · Posted 06-08-2016 06:08 PM

I have a dataset like below:

StudyID	Sereis	Hb	CO2
S001	1	21	49
S001	2	22	51
S001	3	23	53
S001	4	24	55
S001	5	25	57
S001	6	26	59
S001	7	27	61
S001	8	28	63
S001	9	29	65
S001	10	30	67
S002	1	18	45
S002	2	20	47
S002	3	22	49
S002	4	24	51
S002	5	26	53
S002	6	28	55

I'd like to create a dataset like follow. cutting data from series 1- 4 as 1st sub-dataset, then cutting from series from 3-6 as 2nd sub-dataset, and so on. would you please help?

StudyID	Sereis	Hb	CO2	Subset
S001	1	21	49	1
S001	2	22	51	1
S001	3	23	53	1
S001	4	24	55	1
S001	3	23	53	2
S001	4	24	55	2
S001	5	25	57	2
S001	6	26	59	2
S001	5	27	61	3
S001	6	28	63	3
S001	7	29	65	3
S001	8	30	67	3
S001	7	31	69	4
S001	8	32	71	4
S001	9	33	73	4
S001	10	34	75	4
S002	1	18	45	1
S002	2	20	47	1
S002	3	22	49	1
S002	4	24	51	1
S002	3	26	53	2
S002	4	28	55	2
S002	5	30	57	2
S002	6	32	59	2

Ksharp · Posted 06-09-2016 11:46 PM

Oh. I just realize the raw data is different with output . Try this one :



data have;
infile cards expandtabs truncover;
input StudyID $	Sereis	Hb	CO2;
cards;
S001	1	21	49
S001	2	22	51
S001	3	23	53
S001	4	24	55
S001	5	25	57
S001	6	26	59
S001	7	27	61
S001	8	28	63
S001	9	29	65
S001	10	30	67
S002	1	18	45
S002	2	20	47
S002	3	22	49
S002	4	24	51
S002	5	26	53
S002	6	28	55
;
run;
data temp;
 merge have have(keep=StudyID rename=(StudyID=_StudyID) firstobs=3);
 if mod(_n_,2)=1 or StudyID ne lag(StudyID) then group+1;
 idx=1;output;
 if StudyID=lag2(StudyID) and StudyID=_StudyID then do;idx=2;output;end;
 drop _StudyID;
run;
proc sort data=temp;
 by StudyID group idx Sereis;
run;
data temp;
 set temp;
 by StudyID;
 if first.StudyID then do;k=0;g=0;end;
 k+1;
 if mod(k,4)=1 then g+1;
 drop k group idx;
run;
proc sql noprint;
 select distinct cats('want',g) into : data separated by ' '
  from temp;
 select distinct cats('when(',g,') output want',g,';') into : select separated by ' '
  from temp;
run;
data &data ;
 set temp;
 select(g);
 &select
 otherwise;
 end;
 drop g;
run;

View solution in original post

ballardw · Posted 06-08-2016 06:54 PM

You will have to explain all of the rules involved in going from 10 records within an id and selectively dupicating some of them such as why were series 3 through 8 duplicated but 1, 2, 9 and 10 not?

Are there always exactly 10 initial values of series for an Id? If fewer or more what would the duplication pattern be?

Is the exact output order critical?

zglu · Posted 06-09-2016 10:35 AM

sure. this is a sample data from our study. we measured each patient's Hb and CO2 over time. like patient-1 (studyID=S001) had 10 time point (Sereis 1-10) measurements, but patient-2 (S002) only had 6 time point (Sereis 1-6) measurements. we want to cut the data for a speciafic time windows (here every 4 time points), but the next cut window should be re-set from half of previous window (i.e. 1st sub-set: from sereis 1 to 4; 2nd sub-set: from sereis 3 to 6 and so on). so there is no dulplicated at both begining and ending time point when we move the window forward.

thanks for help!

Reeza · Posted 06-08-2016 09:20 PM

In general you can do this as follows, this creates two datasets, one for females and one for males.

If you have many categories, or need some sort of dynamic set up it's more complex and you'll need to provide details for a better sample. This should help you get started.

data female male;
set sashelp.class;
if sex='F' then output female;
else if sex='M' then output male;
run;

zglu · Posted 06-09-2016 10:46 AM

thanks. good to know.

Ksharp · Posted 06-09-2016 12:10 AM

Does every StudyID group have the number which is multiply of 4 -- n*4 ?
If that was , that would be easy.

data temp;
 set have;
 by StudyID;
 if first.StudyID then group=0;
 if mod(_n_,4)=1 then group+1;
run;

after that use HashTable or SELECT to split the table.

zglu · Posted 06-09-2016 10:45 AM

Thanks Xia. Yes. I'd like to cut data by using same time window (here is 4 time points) for each studyID. but I need to re-set next window start point from half of the previous window. i.e. 1st group: from sereis 1-4; 2nd group: from sereis 3-6 (not from sereis 5-8).

Ksharp · Posted 06-09-2016 11:46 PM

Oh. I just realize the raw data is different with output . Try this one :



data have;
infile cards expandtabs truncover;
input StudyID $	Sereis	Hb	CO2;
cards;
S001	1	21	49
S001	2	22	51
S001	3	23	53
S001	4	24	55
S001	5	25	57
S001	6	26	59
S001	7	27	61
S001	8	28	63
S001	9	29	65
S001	10	30	67
S002	1	18	45
S002	2	20	47
S002	3	22	49
S002	4	24	51
S002	5	26	53
S002	6	28	55
;
run;
data temp;
 merge have have(keep=StudyID rename=(StudyID=_StudyID) firstobs=3);
 if mod(_n_,2)=1 or StudyID ne lag(StudyID) then group+1;
 idx=1;output;
 if StudyID=lag2(StudyID) and StudyID=_StudyID then do;idx=2;output;end;
 drop _StudyID;
run;
proc sort data=temp;
 by StudyID group idx Sereis;
run;
data temp;
 set temp;
 by StudyID;
 if first.StudyID then do;k=0;g=0;end;
 k+1;
 if mod(k,4)=1 then g+1;
 drop k group idx;
run;
proc sql noprint;
 select distinct cats('want',g) into : data separated by ' '
  from temp;
 select distinct cats('when(',g,') output want',g,';') into : select separated by ' '
  from temp;
run;
data &data ;
 set temp;
 select(g);
 &select
 otherwise;
 end;
 drop g;
run;

zglu · Posted 06-13-2016 10:05 AM

Thank you so much! this works very well in our project.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away