Uniq value validation

znhnm · Posted 12-09-2022 10:56 AM

Hi,

I have a dataset that col2 can have multiple values of col1 but col1 can have only 1 value of col2. However, I'd like to validate if there are any records that doesn't fit to this. How would I validate that each col1 has only one col2 value? (e.g., The first row of the sample table below is col1 =1 and col2 = A. I want to make sure "1" has only "A", not "B", "C", "D", etc... "A" can have 1, 2, 3... though.)

Regards,

COL1	COL2
1	A
2	A
3	A
4	B
5	B
6	B
7	C
8	C
9	D
10	K

PaigeMiller · Posted 12-09-2022 11:10 AM

Can you please provide a more meaningful data set, where some COL1 violate the condition and other COL1 values do not violate the condition?

--
Paige Miller

ballardw · Posted 12-09-2022 11:12 AM

Suggestion: Provide example data that has the problem you are looking. Then provide an example of how you want the problem reported.

Best is to provide example data in the form of a data step. Below is an example data set and one method of finding Col1 with more than 1 Col2 value.

data have;
input col1 col2 $;
datalines;
1	A
2	A
3	A
4	B
5	B
6	B
7	C
8	C
9	D
10	K
10 A    <= adds a case with multiple col2 for col1
;

/* one way: double count*/
proc freq data=have noprint;
   tables col1*col2/out=count1;
run;

proc freq data=count1 noprint;
   tables col1 / out=count2(where=(count>1));
run;

proc print data=count2 noobs;
   title "Col1 with more than 1 value for Col2";
   var col1;
run;title;

Paste data step code into a text box opened on the forum using the </> icon that appears above the message window. The message windows will reformat text and may result in data step code that will not run properly. Also, the box makes it easier to tell the difference between comment text and actual code.

s_lassen · Posted 12-09-2022 11:22 AM

In a data step, you could do something like this:

proc sort data=<whatever your input data is called>;
  by col1 col2;
run;

data errors;
  set <whatever your input data is called>;
  by col1 col2;
  if first.col2 and not first.col1;
run;

This will give you a table containing the COL1 values that have multiple COL2 values.

But if you need all the records for the erring COL1 values, it is probably easier to use SQL:

proc sql;
  create table errors as select * from <whatever your input data is called>
  group by col1
  having count(distinct col2)>1;
quit;

LinusH · Posted 12-12-2022 07:27 AM

For "How would I validate that each col1 has only one col2 value?", you could define COL1 as a primary key, hence not allowing for duplicates to be added.

Not sure if there was another constraint given your description...?

Data never sleeps

Uniq value validation

Re: Uniq value validation

Re: Uniq value validation

Re: Uniq value validation

Re: Uniq value validation

Uniq value validation

Re: Uniq value validation

Re: Uniq value validation

Re: Uniq value validation

Re: Uniq value validation

Click image to register for webinar

Classroom Training Available!