Solved: Selecting first observation on multiple conditions

windy · Posted 11-03-2021 01:41 AM

Hello everyone,

I need your help in this problem. Suppose I have the data set like that. I have multiple IDs and time periods. The time frame may be different across different IDs (ID1 exists from time 1 to time 5; ID2 exists from time 3 to 20, ID3 exists from time 6 to time 17). Then I have stress time variable receiving value of 1 if it's a stressed time and 0 otherwise. In this example, stress time happens twice (time 8-10 and time 16-17). Finally, I have var 1 where contains the information I want to extract.

No	ID	time	stress_time	var1
1	1	1	0	100
2	1	2	0	103
3	1	3	0	106
4	1	4	0	109
5	1	5	0	112
6	2	3	0	300
7	2	4	0	310
8	2	5	0	320
9	2	6	0	330
10	2	7	0	340
11	2	8	1	350
12	2	9	1	360
13	2	10	1	370
14	2	11	0	380
15	2	12	0	390
16	2	13	0	400
17	2	14	0	410
18	2	15	0	420
19	2	16	1	430
20	2	17	1	440
21	2	18	0	450
22	2	19	0	460
23	2	20	0	470
24	3	6	0	220
25	3	7	0	230
26	3	8	1	240
27	3	9	1	250
28	3	10	1	260
29	3	11	0	270
30	3	12	0	280
31	3	13	0	290
32	3	14	0	300
33	3	15	0	310
34	3	16	1	320
35	3	17	1	330

What I want to do is that I want to extract all of the first observations for each ID/Stress time frame. I tried with proc sort nodupkey by id stress_time, but it's not the right solution because I can't collect the first observation of the other stress time frame. In specific, I need to create a sample containing observations number 11, 19, 26, and 34. I think I need to use Proc sql for this problem, but I don't know how to do it yet 😞

Please help me out.

Best regards,

Windy.

PeterClemmensen · Posted 11-03-2021 02:23 AM

How about

data have;
input No ID time  stress_time var1;
datalines;
1  1  1  0  100
2  1  2  0  103
3  1  3  0  106
4  1  4  0  109
5  1  5  0  112
6  2  3  0  300
7  2  4  0  310
8  2  5  0  320
9  2  6  0  330
10 2  7  0  340
11 2  8  1  350
12 2  9  1  360
13 2  10 1  370
14 2  11 0  380
15 2  12 0  390
16 2  13 0  400
17 2  14 0  410
18 2  15 0  420
19 2  16 1  430
20 2  17 1  440
21 2  18 0  450
22 2  19 0  460
23 2  20 0  470
24 3  6  0  220
25 3  7  0  230
26 3  8  1  240
27 3  9  1  250
28 3  10 1  260
29 3  11 0  270
30 3  12 0  280
31 3  13 0  290
32 3  14 0  300
33 3  15 0  310
34 3  16 1  320
35 3  17 1  330
;

data want;
   set have;
   by ID stress_time notsorted;
   if first.stress_time and stress_time = 1;
run;

Result:

No  ID  time  stress_time  var1
11  2   8     1            350
19  2   16    1            430
26  3   8     1            240
34  3   16    1            320

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 11-03-2021 02:23 AM

How about

data have;
input No ID time  stress_time var1;
datalines;
1  1  1  0  100
2  1  2  0  103
3  1  3  0  106
4  1  4  0  109
5  1  5  0  112
6  2  3  0  300
7  2  4  0  310
8  2  5  0  320
9  2  6  0  330
10 2  7  0  340
11 2  8  1  350
12 2  9  1  360
13 2  10 1  370
14 2  11 0  380
15 2  12 0  390
16 2  13 0  400
17 2  14 0  410
18 2  15 0  420
19 2  16 1  430
20 2  17 1  440
21 2  18 0  450
22 2  19 0  460
23 2  20 0  470
24 3  6  0  220
25 3  7  0  230
26 3  8  1  240
27 3  9  1  250
28 3  10 1  260
29 3  11 0  270
30 3  12 0  280
31 3  13 0  290
32 3  14 0  300
33 3  15 0  310
34 3  16 1  320
35 3  17 1  330
;

data want;
   set have;
   by ID stress_time notsorted;
   if first.stress_time and stress_time = 1;
run;

Result:

No  ID  time  stress_time  var1
11  2   8     1            350
19  2   16    1            430
26  3   8     1            240
34  3   16    1            320

The DATA to DATA Step Macro
Blog: SASnrd

Kurt_Bremser · Posted 11-03-2021 02:30 AM

You need to have id in the BY, otherwise you miss an event where an old id ends with 1 and the new id also starts with 1.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PeterClemmensen · Posted 11-03-2021 02:52 AM

Of Course 🙂

Corrected.

The DATA to DATA Step Macro
Blog: SASnrd

Kurt_Bremser · Posted 11-03-2021 02:27 AM

Make sure that the dataset is sorted by id and time, then do:

data want;
set have;
by id time stress_time notsorted;
if stress_time and first.stress_time;
run;

Will extract any first occurence of stress_time = 1.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

windy · Posted 11-03-2021 04:17 AM

Thank you @PeterClemmensen and @Kurt_Bremser for your help.

I got what I need now.

Selecting first observation on multiple conditions

Re: Selecting first observation on multiple conditions

Re: Selecting first observation on multiple conditions

Re: Selecting first observation on multiple conditions

Re: Selecting first observation on multiple conditions

Re: Selecting first observation on multiple conditions

Re: Selecting first observation on multiple conditions

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!