Solved: Testing for Consistency Between Rows within Categories

Kade_M · Posted 02-17-2020 10:40 AM

Hello,

I am trying to test whether or not records of a particular category all have consistent values in 3 fields, ideally within a data step (no other reason than this is how all of the other tests are done, and I'm a sucker for consistency).

Unfortunately this is work related, and the data is highly confidential... but I believe I can very closely align this to a typical "store front" type data structure for all intents and purposes.

There are 5 key fields I need to consider within this metaphor. Account Number, Order Number, Store Name, Product Name, Product Price. I need to test that for each Account Number, for each Order Number... Store Name, Product Name, and Product Price are all consistent (the same). Where this analogy breaks from reality. is that the Store Name / Product Name / Product Price can be different for different Accounts... but for each Account, they must be the same.

I would love to share a code sample, but I feel as though Data Steps can only look at 1 records at a time so I'm just not sure how to even get started... But maybe a mock data sample might help. The desired result here, is that due to Order 003 being different... all 3 records get flagged as inconsistent. (My tests all return a 0 or a 1 for pass/fail in the summary report.)

Account # Order # StoreName ProductName ProductPrice

00000001 001 Store1 Product1 100.00

00000001 002 Store1 Product1 100.00

00000001 003 Store2 Product1 100.00

Each account can have any number of Orders. I read up on being able to look at the previous/next row... but I couldn't ascertain how to apply that to my given situation. I thought maybe I would simply need to iterate... Compare 001 to 002, then 002 to 003... But that thought line died when I asked myself how I would then update 001 to flag as fail if 002 vs. 003 failed the integrity check.

I feel at this point I'm over complicating and over thinking the problem. Any assistance is much appreciated.

I apologize for the lack of code and real data.

Best Regards,

-Kade

PeterClemmensen · Posted 02-17-2020 11:03 AM

@Kade_M Hi and welcome to the SAS Community 🙂

There are many ways to do this. Here is one. I added another group to your data for demonstration purpose.

Feel free to ask if you have any questions.

data have;
input Account $ Order $ StoreName $ ProductName $ ProductPrice;
datalines;
00000001 001 Store1 Product1 100.00
00000001 002 Store1 Product1 100.00
00000001 003 Store2 Product1 100.00
00000002 001 Store1 Product1 100.00
00000002 002 Store1 Product1 100.00
00000002 003 Store1 Product1 100.00
;

data want (drop=_:);
    do until (last.Account);
        set have;
        by Account;
        if first.Account then do;
            _StoreName    = StoreName;
            _ProductName  = ProductName;
            _ProductPrice = ProductPrice;
            flag          = 0;
        end;

        if _StoreName    ne StoreName   | 
           _ProductName  ne ProductName |
           _ProductPrice ne ProductPrice then flag = 1;
    end;

    do until (last.Account);
        set have;
        by Account;
        output;
    end;
run;

Result:

Account   Order  StoreName  ProductName  ProductPrice  flag
00000001  001    Store1     Product1     100           1
00000001  002    Store1     Product1     100           1
00000001  003    Store2     Product1     100           1
00000002  001    Store1     Product1     100           0
00000002  002    Store1     Product1     100           0
00000002  003    Store1     Product1     100           0

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 02-17-2020 11:03 AM

@Kade_M Hi and welcome to the SAS Community 🙂

There are many ways to do this. Here is one. I added another group to your data for demonstration purpose.

Feel free to ask if you have any questions.

data have;
input Account $ Order $ StoreName $ ProductName $ ProductPrice;
datalines;
00000001 001 Store1 Product1 100.00
00000001 002 Store1 Product1 100.00
00000001 003 Store2 Product1 100.00
00000002 001 Store1 Product1 100.00
00000002 002 Store1 Product1 100.00
00000002 003 Store1 Product1 100.00
;

data want (drop=_:);
    do until (last.Account);
        set have;
        by Account;
        if first.Account then do;
            _StoreName    = StoreName;
            _ProductName  = ProductName;
            _ProductPrice = ProductPrice;
            flag          = 0;
        end;

        if _StoreName    ne StoreName   | 
           _ProductName  ne ProductName |
           _ProductPrice ne ProductPrice then flag = 1;
    end;

    do until (last.Account);
        set have;
        by Account;
        output;
    end;
run;

Result:

Account   Order  StoreName  ProductName  ProductPrice  flag
00000001  001    Store1     Product1     100           1
00000001  002    Store1     Product1     100           1
00000001  003    Store2     Product1     100           1
00000002  001    Store1     Product1     100           0
00000002  002    Store1     Product1     100           0
00000002  003    Store1     Product1     100           0

The DATA to DATA Step Macro
Blog: SASnrd

Kade_M · Posted 02-17-2020 11:17 AM

@PeterClemmensen Thank you! I am really excited about your solution as there are several things I'm unfamiliar with, so lots to learn here!

Looking over the code, it makes perfect sense. I have a couple questions though, just to make sure I understand what is happening.

(drop=_:); - What exactly is this doing? I understand you are prefixing values from previous rows with the underscore, is this deleting them from the resulting dataset? The semi-colon acting as a sort of wildcard?
do until (last.Account); - I'm not sure what this second do until loop is performing. My only guess is that it's telling it, once we're at the last account... return the augmented dataset as the set have table?

My last question, I've been doing all of my tests in a DATA step as I mentioned... Each one sorta looks like this redacted sample below. Could I add this code into the same data step? I feel the answer is no, because then it would execute this loop every single row as it processes the other tests.

*************************************************************;
	* TEST_1	scrubbed description
	*************************************************************;
	IF field_1 ne '.' AND field_2 ne '.' THEN
		IF field_1 > field_2 THEN 
			DO;
				TEST_1 = 1;
				HAS_ERROR = 1;
			END;
		ELSE TEST_1 = 0;
	ELSE TEST_1 = 0;

Time in invaluable, thank you for sharing yours to help me learn.

Best Regards,

-Kade

PeterClemmensen · Posted 02-18-2020 01:49 AM

@Kade_M, anytime 🙂 Glad you found your solution.

- As to the drop=_: part, you are correct. I use the colon operator to specify a list of variables that begin with a common prefix.

- The technique here is known as a 'Double DoW Loop'. Plenty of information online if you want to get a good grasp of it. Shortly put, the second loop ensures that I flag every observation of the By-Group where there is an inconsistency. And not only the observations where the inconsistency is present.

As to your final question, I'm not sure. I don't know how or where your field variables are calculated. However, my guess is that you can put the code at the bottom of the data step, ie after the second loop.

Regards.

The DATA to DATA Step Macro
Blog: SASnrd

Testing for Consistency Between Rows within Categories

Re: Testing for Consistency Between Rows within Categories

Re: Testing for Consistency Between Rows within Categories

Re: Testing for Consistency Between Rows within Categories

Re: Testing for Consistency Between Rows within Categories

Testing for Consistency Between Rows within Categories

Re: Testing for Consistency Between Rows within Categories

Re: Testing for Consistency Between Rows within Categories

Re: Testing for Consistency Between Rows within Categories

Re: Testing for Consistency Between Rows within Categories

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away