BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zglu
Calcite | Level 5

I have a dataset like below:

StudyIDSereisHbCO2
S00112149
S00122251
S00132353
S00142455
S00152557
S00162659
S00172761
S00182863
S00192965
S001103067
S00211845
S00222047
S00232249
S00242451
S00252653
S002628

55

 

I'd like to create a dataset like follow. cutting data from series 1- 4 as 1st sub-dataset, then cutting from series from 3-6 as 2nd sub-dataset, and so on. would you please help?

StudyIDSereisHbCO2Subset
S001121491
S001222511
S001323531
S001424551
S001323532
S001424552
S001525572
S001626592
S001527613
S001628633
S001729653
S001830673
S001731694
S001832714
S001933734
S0011034754
S002118451
S002220471
S002322491
S002424511
S002326532
S002428552
S002530572
S002632592

  

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Oh. I just realize the raw data is different with output . Try this one :



data have;
infile cards expandtabs truncover;
input StudyID $	Sereis	Hb	CO2;
cards;
S001	1	21	49
S001	2	22	51
S001	3	23	53
S001	4	24	55
S001	5	25	57
S001	6	26	59
S001	7	27	61
S001	8	28	63
S001	9	29	65
S001	10	30	67
S002	1	18	45
S002	2	20	47
S002	3	22	49
S002	4	24	51
S002	5	26	53
S002	6	28	55
;
run;
data temp;
 merge have have(keep=StudyID rename=(StudyID=_StudyID) firstobs=3);
 if mod(_n_,2)=1 or StudyID ne lag(StudyID) then group+1;
 idx=1;output;
 if StudyID=lag2(StudyID) and StudyID=_StudyID then do;idx=2;output;end;
 drop _StudyID;
run;
proc sort data=temp;
 by StudyID group idx Sereis;
run;
data temp;
 set temp;
 by StudyID;
 if first.StudyID then do;k=0;g=0;end;
 k+1;
 if mod(k,4)=1 then g+1;
 drop k group idx;
run;
proc sql noprint;
 select distinct cats('want',g) into : data separated by ' '
  from temp;
 select distinct cats('when(',g,') output want',g,';') into : select separated by ' '
  from temp;
run;
data &data ;
 set temp;
 select(g);
 &select
 otherwise;
 end;
 drop g;
run;
 


View solution in original post

8 REPLIES 8
ballardw
Super User

You will have to explain all of the rules involved in going from 10 records within an id and selectively dupicating some of them such as why were series 3 through 8 duplicated but 1, 2, 9 and 10 not?

Are there always exactly 10 initial values of series for an Id? If fewer or more what would the duplication pattern be?

Is the exact output order critical?

 

zglu
Calcite | Level 5

sure. this is a sample data from our study. we measured each patient's Hb and CO2 over time. like patient-1 (studyID=S001) had 10 time point (Sereis 1-10) measurements, but patient-2 (S002) only had 6 time point (Sereis 1-6) measurements. we want to cut the data for a speciafic time windows (here every 4 time points), but the next cut window should be re-set from half of previous window (i.e. 1st sub-set: from sereis 1 to 4; 2nd sub-set: from sereis 3 to 6 and so on). so there is no dulplicated at both begining and ending time point when we move the window forward.

 

thanks for help! 

 

Reeza
Super User

In general you can do this as follows, this creates two datasets, one for females and one for males.

If you have many categories, or need some sort of dynamic set up it's more complex and you'll need to provide details for a better sample. This should help you get started.

 

data female male;
set sashelp.class;
if sex='F' then output female;
else if sex='M' then output male;
run;
zglu
Calcite | Level 5

thanks. good to know.

Ksharp
Super User
Does every StudyID group have the number which is multiply of 4 -- n*4 ?
If that was , that would be easy.

data temp;
 set have;
 by StudyID;
 if first.StudyID then group=0;
 if mod(_n_,4)=1 then group+1;
run;

after that use HashTable or SELECT to split the table.


zglu
Calcite | Level 5

Thanks Xia. Yes. I'd like to cut data by using same time window (here is 4 time points) for each studyID. but I need to re-set next window start point from half of the previous window. i.e. 1st group: from sereis 1-4; 2nd group: from sereis 3-6 (not from sereis 5-8).

Ksharp
Super User
Oh. I just realize the raw data is different with output . Try this one :



data have;
infile cards expandtabs truncover;
input StudyID $	Sereis	Hb	CO2;
cards;
S001	1	21	49
S001	2	22	51
S001	3	23	53
S001	4	24	55
S001	5	25	57
S001	6	26	59
S001	7	27	61
S001	8	28	63
S001	9	29	65
S001	10	30	67
S002	1	18	45
S002	2	20	47
S002	3	22	49
S002	4	24	51
S002	5	26	53
S002	6	28	55
;
run;
data temp;
 merge have have(keep=StudyID rename=(StudyID=_StudyID) firstobs=3);
 if mod(_n_,2)=1 or StudyID ne lag(StudyID) then group+1;
 idx=1;output;
 if StudyID=lag2(StudyID) and StudyID=_StudyID then do;idx=2;output;end;
 drop _StudyID;
run;
proc sort data=temp;
 by StudyID group idx Sereis;
run;
data temp;
 set temp;
 by StudyID;
 if first.StudyID then do;k=0;g=0;end;
 k+1;
 if mod(k,4)=1 then g+1;
 drop k group idx;
run;
proc sql noprint;
 select distinct cats('want',g) into : data separated by ' '
  from temp;
 select distinct cats('when(',g,') output want',g,';') into : select separated by ' '
  from temp;
run;
data &data ;
 set temp;
 select(g);
 &select
 otherwise;
 end;
 drop g;
run;
 


zglu
Calcite | Level 5

Thank you so much! this works very well in our project.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1473 views
  • 1 like
  • 4 in conversation