BookmarkSubscribeRSS Feed
znhnm
Quartz | Level 8

Hi, 

 

I have a dataset that col2 can have multiple values of col1 but col1 can have only 1 value of col2. However, I'd like to validate if there are any records that doesn't fit to this. How would I validate that each col1 has only one col2 value? (e.g., The first row of the sample table below is col1 =1 and col2 = A. I want to make sure "1" has only "A", not "B", "C", "D", etc... "A" can have 1, 2, 3... though.)

 

Regards, 

 

COL1COL2 
1A
2A
3A
4B
5B
6B
7C
8C
9D
10K
  
4 REPLIES 4
PaigeMiller
Diamond | Level 26

Can you please provide a more meaningful data set, where some COL1 violate the condition and other COL1 values do not violate the condition?

--
Paige Miller
ballardw
Super User

Suggestion: Provide example data that has the problem you are looking. Then provide an example of how you want the problem reported.

 

Best is to provide example data in the form of a data step. Below is an example data set and one method of finding Col1 with more than 1 Col2 value.

data have;
input col1 col2 $;
datalines;
1	A
2	A
3	A
4	B
5	B
6	B
7	C
8	C
9	D
10	K
10 A    <= adds a case with multiple col2 for col1
;

/* one way: double count*/
proc freq data=have noprint;
   tables col1*col2/out=count1;
run;

proc freq data=count1 noprint;
   tables col1 / out=count2(where=(count>1));
run;

proc print data=count2 noobs;
   title "Col1 with more than 1 value for Col2";
   var col1;
run;title;

Paste data step code into a text box opened on the forum using the </> icon that appears above the message window. The message windows will reformat text and may result in data step code that will not run properly. Also, the box makes it easier to tell the difference between comment text and actual code.

 

 

s_lassen
Meteorite | Level 14

In a data step, you could do something like this:

proc sort data=<whatever your input data is called>;
  by col1 col2;
run;

data errors;
  set <whatever your input data is called>;
  by col1 col2;
  if first.col2 and not first.col1;
run;
  

This will give you a table containing the COL1 values that have multiple COL2 values. 

 

But if you need all the records for the erring COL1 values, it is probably easier to use SQL:

proc sql;
  create table errors as select * from <whatever your input data is called>
  group by col1
  having count(distinct col2)>1;
quit;
LinusH
Tourmaline | Level 20

For "How would I validate that each col1 has only one col2 value?", you could define COL1 as a primary key, hence not allowing for duplicates to be added.

Not sure if there was another constraint given your description...?

 

Data never sleeps

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 351 views
  • 3 likes
  • 5 in conversation