Data management question

Reply
Frequent Contributor
Posts: 75

Data management question

Suppose the hypothetical data looks like this:

data temp;

input ID Rec1 OrderRec1 Rec2 OrderRec2;

datalines;

001     10     1     .         1

001      .       2      10     2

001     11     3      11     3

001     12     4     .         4

001     .        5     12      5

;

The set-up is: For each cluster of ID (here I give only ID=001), there always exists pairs of Rec1 and Rec2 such that Rec1=Rec2. These variables can be non-missing for each observation (as in the third row) or one can be missing in one observation and appears in the other. The order variables are always available, they have certain categories (here 5) and indicate the ordering of Rec1 and Rec2.

I now want to create a data that contrast Rec1 and Rec2 for each cluster of ID, and if they are equal, create a new variable (say OrderRec2New) that equals the value of OrderRec2.  In other words, I want to make a new data with records that look like row 3. Unnecessary information from Rec2 and OrderRec2 can be deleted, it looks like this:

001     10     1     10     2

001     11     3     11     3

001     12     4     12     5

Super Contributor
Posts: 307

Re: Data management question

Let's see if I've got this right . . . .

If REC1 = REC2 then keep the record (e.g. 3rd record in your sample input data).

If REC2 is missing, then combine the record with the next sequential record (e.g. first two records of your input data become one row, with REC1, ORDERREC1 from the first row and REC2, ORDERREC2 from the second row).

Assuming I understand this correctly, here's a PROC SQL approach:

proc sql;

     create table want as

     /* get records that already have matches */

     ( select *

     from temp

     where rec1 = rec2 )

    UNION

     /* merge non-matches */

     ( select t1.*, t2.rec2, t2.orderrec2

     from temp t1

          /* get the missing values from the next obs */

          inner join

          ( select t2.*

            from temp t2

            where t2.rec1 is missing and t2.rec2 is not missing ) r2

     on r2.id = t1.id

          and r2.recno = t1.recno + 1

     where t1.rec1 is not missing and t1.rec2 is missing )

;

quit;

Super Contributor
Posts: 644

Re: Data management question

A datastep merge might be simpler, assuming the data is ordered as shown:

Data want ;

     Merge    have (where = (Rec1 is not Null)

                           drop = Rec2 orderRec2

                          )

                  have (where = (Rec2 is not Null)                        

                           drop = Rec1 orderRec1

                          )

                  ;

     By     ID ;

Run ;

Richard

Super User
Posts: 5,256

Re: Data management question

Even a simple SQL inner join would work, and does not require that the data is presorted.

Data never sleeps
Community Manager
Posts: 486

Re: Data management question

I'm glad you received a couple of helpful answers, NonSleeper. I wanted to let you know that there is a SAS Support Data Management Community where you can post questions like these. You may find other useful resources there as well.

Anna

Ask a Question
Discussion stats
  • 4 replies
  • 243 views
  • 10 likes
  • 5 in conversation