BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hjjijkkl
Pyrite | Level 9

I have a big data where the ID repeats multiple times (table 1) in a column and I want to reduce into 1/observation (table 2). Ones the IDs are reduces to 1 per obs, I want to merge table 2 with table 3.

Table 1

ID

1

1

2

3

3

3

4

4

 

I want the output to be

Table 2

ID

1

2

3

4

 

I want to merge table 2 with table 3 below to have an output like table 4

Table 3

ID

Visit1

Visit2

Visit3

1

1/4/18

6/9/18

10/12/18

3

7/8/16

7/30/18

9/6/18

 

I want the output to be

Table 4

ID

Visit1

Visit2

Visit3

1

1/4/18

6/9/18

10/12/18

2

.

.

.

3

7/8/16

7/30/18

9/6/18

4

.

.

.

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

If your data is sorted by id then one way:

data table_2;
   set table_1;
   by id;
   if first.id;
run;

When the data step uses a BY statement each variable on the statement gets two automatic temporary variables that have values 1 (true) and 0 (false) to indicate if the current observation is the First or Last of the by group. You access those with  the  First.variablename or Last.variablename construct.

The IF is a subsetting if and only keeps records for the output data set where the condition is true, i.e. the first record with that value of Id.

View solution in original post

3 REPLIES 3
Astounding
PROC Star

It might be easier to skip creating Table_2 entirely:

data table_4;
   merge table_1 table_3;
   by ID;
   if first.id;
run;

Of course this assumes that you don't really need Table_2 as long as you can get Table_4.

 

It requires both data sets to be in sorted order as a starting point.

hjjijkkl
Pyrite | Level 9
Thanks! But, I still need table 2? Please show me how to generate table 2.
ballardw
Super User

If your data is sorted by id then one way:

data table_2;
   set table_1;
   by id;
   if first.id;
run;

When the data step uses a BY statement each variable on the statement gets two automatic temporary variables that have values 1 (true) and 0 (false) to indicate if the current observation is the First or Last of the by group. You access those with  the  First.variablename or Last.variablename construct.

The IF is a subsetting if and only keeps records for the output data set where the condition is true, i.e. the first record with that value of Id.