BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SannaSanna
Quartz | Level 8

Hi. 

I have a very large dataset and I need to be able to locate data discrepancies to exclude from my rate calculation.  In my example below, column Verd1 is the main identifier and column LocalVerd is another identifier created by a vendor.  I need to figure out how to code to identify erroneous data within a group.  In the example below, under column Plan, Pacific would pass and is correct as the rows within Pacific contain consistent identifier information (Same Verd1 and same localVerd).  However, Plan Delta for product X33ui will need to be excluded as this product (X33ui) contain two different LocalVerd identifiers (PX333 and Rx988) for this same Verd1 identifier (Twa88)- LocalVerd should be PX333 but reported Rx988.   

 

Is there a way to code to output table containing all the Plan and product containing inconsistencies?    Data table is below and my output table I would like is the second table.  Thank you in advance

 

Plan Product Verd1 LocalVerd
Delta X33ui Twa88 PX333
Delta X33ui Twa88 PX333
Delta X33ui Twa88 RX988
Pacific V933aa Lbl322 QT770
Pacific V933aa Lbl322 QT770
       
Wanted OUTPUT    
Plan Product Verd1 LocalVerd
Delta X33ui Twa88 PX333
Delta X33ui Twa88 PX333
Delta X33ui Twa88 RX988

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
  infile datalines truncover;
  input (Plan Product Verd1 LocalVerd) ($);
  datalines;
Delta X33ui Twa88 PX333
Delta X33ui Twa88 PX333
Delta X33ui Twa88 RX988
Pacific V933aa Lbl322 QT770
Pacific V933aa Lbl322 QT770
;

proc sql;
create table want as
select *
 from have
  group by plan,product
   having count(distinct cats(Verd1,LocalVerd))>1 ;
quit;

View solution in original post

5 REPLIES 5
mkeintz
PROC Star

If the data are already grouiped by VERD1, then

 

 

data want (drop=n_local_changes);

  do n_local_changes=0 by 1 until (last.verd1);
    do until (last.localverd);
      set have;
      by verd1 localverd notsorted;
    end;
  end;

  do until (last.verd1;
    set have;
    by verd1;
    if n_local_changes>0;
  end;
run;

This counts the number of times the value of localverd changes within a given VERD1.  It's not necessarily the cardinality of localverd, unless the data within a verd1 is grouped by localverd.  Arter the change count is calculated, the verd1 group is re-read, and output if the count is at least 1.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

Going forward many people appreciate if the one asking the question provides sample data via a fully working SAS data step like below:

data have;
  infile datalines truncover;
  input (Plan Product Verd1 LocalVerd) ($);
  datalines;
Delta X33ui Twa88 PX333
Delta X33ui Twa88 PX333
Delta X33ui Twa88 RX988
Pacific V933aa Lbl322 QT770
Pacific V933aa Lbl322 QT770
;

Here some code using the sample data created above that hopefully does what you're after or then at least points you into the right direction.

If you're just after the distinct list of rows with inconsistencies then you don't need the data step.

proc sort 
  data=have(keep=Plan Product Verd1 LocalVerd)
  out=_inter 
  nodupkey;
  by Plan Verd1 LocalVerd Product;
run;

proc sort data=_inter out=one_row_per_discrepancy nounikey;
  by Plan Verd1;
run;

data all_rows_discrepancies all_rows_ok;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'one_row_per_discrepancy');
      h1.defineKey(all:'y');
      h1.defineDone();
    end;
  set have;
  if h1.check()=0 then output all_rows_discrepancies;
  else output all_rows_ok;
run;

 

SannaSanna
Quartz | Level 8
thank you. This works but the results KSharp code provided below is what I am after. Thank you so much for your help!
Ksharp
Super User
data have;
  infile datalines truncover;
  input (Plan Product Verd1 LocalVerd) ($);
  datalines;
Delta X33ui Twa88 PX333
Delta X33ui Twa88 PX333
Delta X33ui Twa88 RX988
Pacific V933aa Lbl322 QT770
Pacific V933aa Lbl322 QT770
;

proc sql;
create table want as
select *
 from have
  group by plan,product
   having count(distinct cats(Verd1,LocalVerd))>1 ;
quit;
SannaSanna
Quartz | Level 8
Thank you so much KSharp!! I so appreciate your help. This was exactly what I was looking for. thank you again!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1017 views
  • 2 likes
  • 4 in conversation