BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kade_M
Calcite | Level 5

Hello,

 

I am trying to test whether or not records of a particular category all have consistent values in 3 fields, ideally within a data step (no other reason than this is how all of the other tests are done, and I'm a sucker for consistency). 

 

Unfortunately this is work related, and the data is highly confidential... but I believe I can very closely align this to a typical "store front" type data structure for all intents and purposes. 

 

There are 5 key fields I need to consider within this metaphor. Account Number, Order Number, Store Name, Product Name, Product Price. I need to test that for each Account Number, for each Order Number... Store Name, Product Name, and Product Price are all consistent (the same). Where this analogy breaks from reality. is that the Store Name / Product Name / Product Price can be different for different Accounts... but for each Account, they must be the same.

 

I would love to share a code sample, but I feel as though Data Steps can only look at 1 records at a time so I'm just not sure how to even get started... But maybe a mock data sample might help. The desired result here, is that due to Order 003 being different... all 3 records get flagged as inconsistent. (My tests all return a 0 or a 1 for pass/fail in the summary report.)

 

Account #     Order #     StoreName     ProductName     ProductPrice

00000001     001           Store1             Product1            100.00

00000001     002           Store1             Product1            100.00

00000001     003           Store2             Product1            100.00

 

Each account can have any number of Orders. I read up on being able to look at the previous/next row... but I couldn't ascertain how to apply that to my given situation. I thought maybe I would simply need to iterate... Compare 001 to 002, then 002 to 003... But that thought line died when I asked myself how I would then update 001 to flag as fail if 002 vs. 003 failed the integrity check.

 

I feel at this point I'm over complicating and over thinking the problem. Any assistance is much appreciated.

 

I apologize for the lack of code and real data. 

 

Best Regards,

-Kade

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

@Kade_M Hi and welcome to the SAS Community 🙂

 

There are many ways to do this. Here is one. I added another group to your data for demonstration purpose.

 

Feel free to ask if you have any questions.

 

data have;
input Account $ Order $ StoreName $ ProductName $ ProductPrice;
datalines;
00000001 001 Store1 Product1 100.00
00000001 002 Store1 Product1 100.00
00000001 003 Store2 Product1 100.00
00000002 001 Store1 Product1 100.00
00000002 002 Store1 Product1 100.00
00000002 003 Store1 Product1 100.00
;

data want (drop=_:);
    do until (last.Account);
        set have;
        by Account;
        if first.Account then do;
            _StoreName    = StoreName;
            _ProductName  = ProductName;
            _ProductPrice = ProductPrice;
            flag          = 0;
        end;

        if _StoreName    ne StoreName   | 
           _ProductName  ne ProductName |
           _ProductPrice ne ProductPrice then flag = 1;
    end;

    do until (last.Account);
        set have;
        by Account;
        output;
    end;
run;

Result:

 

Account   Order  StoreName  ProductName  ProductPrice  flag
00000001  001    Store1     Product1     100           1
00000001  002    Store1     Product1     100           1
00000001  003    Store2     Product1     100           1
00000002  001    Store1     Product1     100           0
00000002  002    Store1     Product1     100           0
00000002  003    Store1     Product1     100           0

 

View solution in original post

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

@Kade_M Hi and welcome to the SAS Community 🙂

 

There are many ways to do this. Here is one. I added another group to your data for demonstration purpose.

 

Feel free to ask if you have any questions.

 

data have;
input Account $ Order $ StoreName $ ProductName $ ProductPrice;
datalines;
00000001 001 Store1 Product1 100.00
00000001 002 Store1 Product1 100.00
00000001 003 Store2 Product1 100.00
00000002 001 Store1 Product1 100.00
00000002 002 Store1 Product1 100.00
00000002 003 Store1 Product1 100.00
;

data want (drop=_:);
    do until (last.Account);
        set have;
        by Account;
        if first.Account then do;
            _StoreName    = StoreName;
            _ProductName  = ProductName;
            _ProductPrice = ProductPrice;
            flag          = 0;
        end;

        if _StoreName    ne StoreName   | 
           _ProductName  ne ProductName |
           _ProductPrice ne ProductPrice then flag = 1;
    end;

    do until (last.Account);
        set have;
        by Account;
        output;
    end;
run;

Result:

 

Account   Order  StoreName  ProductName  ProductPrice  flag
00000001  001    Store1     Product1     100           1
00000001  002    Store1     Product1     100           1
00000001  003    Store2     Product1     100           1
00000002  001    Store1     Product1     100           0
00000002  002    Store1     Product1     100           0
00000002  003    Store1     Product1     100           0

 

Kade_M
Calcite | Level 5

@PeterClemmensen Thank you! I am really excited about your solution as there are several things I'm unfamiliar with, so lots to learn here!

 

Looking over the code, it makes perfect sense. I have a couple questions though, just to make sure I understand what is happening.

  • (drop=_:); - What exactly is this doing? I understand you are prefixing values from previous rows with the underscore, is this deleting them from the resulting dataset? The semi-colon acting as a sort of wildcard?
  • do until (last.Account); - I'm not sure what this second do until loop is performing. My only guess is that it's telling it, once we're at the last account... return the augmented dataset as the set have table?

 

My last question, I've been doing all of my tests in a DATA step as I mentioned... Each one sorta looks like this redacted sample below. Could I add this code into the same data step? I feel the answer is no, because then it would execute this loop every single row as it processes the other tests.

 

*************************************************************;
	* TEST_1	scrubbed description
	*************************************************************;
	IF field_1 ne '.' AND field_2 ne '.' THEN
		IF field_1 > field_2 THEN 
			DO;
				TEST_1 = 1;
				HAS_ERROR = 1;
			END;
		ELSE TEST_1 = 0;
	ELSE TEST_1 = 0;

Time in invaluable, thank you for sharing yours to help me learn.

 

Best Regards,

-Kade

PeterClemmensen
Tourmaline | Level 20

@Kade_M, anytime 🙂 Glad you found your solution.

 

- As to the drop=_: part, you are correct. I use the colon operator to specify a list of variables that begin with a common prefix.

 

- The technique here is known as a 'Double DoW Loop'. Plenty of information online if you want to get a good grasp of it. Shortly put, the second loop ensures that I flag every observation of the By-Group where there is an inconsistency. And not only the observations where the inconsistency is present.

 

As to your final question, I'm not sure. I don't know how or where your field variables are calculated. However, my guess is that you can put the code at the bottom of the data step, ie after the second loop.

 

Regards.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1484 views
  • 0 likes
  • 2 in conversation