DATA Step, Macro, Functions and more

Match observation with another observation according to binary variable

Reply
N/A
Posts: 0

Match observation with another observation according to binary variable

data a;
input id mark sales;
cards;
111 1 999999
222 0 888888
333 0 777777
444 1 666666
555 1 555555
666 0 444444
777 0 333333
888 0 222222
999 0 111111
;
run;


Hi, I have the above data set, which is ordered by value of SALES. MARK takes only value of 0 or 1.

What I want to do is to extract the observations with MARK=1 and the next closet observation with MARK=0. By "next closet", I mean for the first observation#1 111 1 999999 the next closet observation is observation #2, which is 222 0 888888 (and hence observation #3 is omitted). Observation #4 corresponds to Obs #6, Obs #5 corresponds to Obs #7. Obs #8 and #9 omitted.

Hence the final result must have equal number of MARK=1 observations and MARK=0 observations, like this

id mark sales
111 1 999999
222 0 888888
444 1 666666
555 1 555555
666 0 444444
777 0 333333

Any idea will be appreciated.
Super Contributor
Super Contributor
Posts: 3,174

Re: Match observation with another observation according to binary variable

A SAS DATA step is one choice with INPUT, IF / THEN logic and OUTPUT to read up your input data, use a temporary RETAIN'd SAS variable (or LAG function) to identify the desired observations to keep.

Scott Barry
SBBWorks, Inc.

Recommended Google advanced search argument this topic/post:

data step programming site:sas.com
Super Contributor
Posts: 474

Re: Match observation with another observation according to binary variable

First you should sort by MARK SALES, both in a descending order.
[pre]
proc sort data=INDATA;
by descending MARK descending SALES;
run;
[/pre]
Then, assuming that MARK will hold a value which is 0 or 1, what you want is output every element with MARK=1 and every element with MARK=0 that are first in the group (if any). This can be done using the auto variables FIRST/LAST to identify the first element of the group:
[pre]
data OUTDATA;
set INDATA;
by descending MARK;
* output every MARK=1 and every first element of the group MARK=0;
if MARK=1 or first.MARK;
run;
[/pre]

Code above not tested!

Cheers from Portugal.

Daniel Santos @ www.cgd.pt
Ask a Question
Discussion stats
  • 2 replies
  • 112 views
  • 0 likes
  • 3 in conversation