DATA Step, Macro, Functions and more

Comparing the information in a row with the next row

Reply
Occasional Learner
Posts: 1

Comparing the information in a row with the next row

Hi everyone, I have a dataset with over 15 million entries. I'm looking to compare the row below with the one above to look for all unique ID's that have a different value next to them. For example:

 

AB1234567, 145633431
AB1234567, 145633431
AB1234568, 145633431
AB1234568, 145633432
AB1234566, 145631513
AB1234565, 145635315

 

In this case, I am creating a counter or loop to look for all unique ID's that appear twice, but have a different value. In this case the only one is AB1234568. I want to ignore AB1234567 because it has the same value next to it both times. 

 

I tried using a counter already and was able to find all values that came up twice but it was very inefficient and did not solve my problem:

Proc summary DATA=AUTO NWAY MISSING;

  class  NIV nopol;

  VAR   xyz;

  OUTPUT OUT=DOUBLES(DROP=_TYPE_ _FREQ_) sum= ;

RUN;

 

proc sort data=DOUBLES;

        by NIV NOPOL;

run;

DATA AUTO2;

SET DOUBLES;

IF XYZ=2;

RUN;

 

I have not used loops in SAS before but I am setting them up by using other people's code as an example. If anyone can help it's appreciated. Thanks

 

 

Super User
Super User
Posts: 7,401

Re: Comparing the information in a row with the next row

You can use either lag or retain functions too look at previous values:

data want;
  set have;
  if lag(id)=id and lag(value) ne value then new_flag=1;
run;
Super Contributor
Posts: 474

Re: Comparing the information in a row with the next row

[ Edited ]

Hi.

 

Assuming your data is grouped by ID but not sorted by group or by VALUE, here's one way using the lag function:

 

data want (rename=(_ID=ID));
        set have end=_EOF;
        drop ID VALUE _VALUE;
        _ID=lag1(ID);
        _VALUE=lag1(VALUE);
        if _ID eq ID and _VALUE ne VALUE then COUNT+1;
        if _EOF or _ID ne ID then do;
           if COUNT gt 1 then output;
           COUNT=1;
        end;
run;

 

This will output the ID and the count of distinct values for VALUE if > 1

 

More on LAG function here: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212547.htm


Hope it helps.

 

Daniel Santos @ www.cgd.pt

Super User
Posts: 5,083

Re: Comparing the information in a row with the next row

Unless I'm missing something, you're asking for something that SAS can do relatively easily:

 

data want;

set have;

by id value notsorted;

if first.value=0 or last.value=0;

run;

 

For the IDs you are looking for, it outputs multiple observations (not just one).

Ask a Question
Discussion stats
  • 3 replies
  • 156 views
  • 0 likes
  • 4 in conversation