BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
emilezola
Calcite | Level 5

Hey everyone,

 

So I agregated a table using proc sql's group by but I see there are a few lines that have duplicate keys. However I have 300 hundreds columns and I can't seem to find the difference manually. Is there a way to see which column are differents between these duplicates?

 

My data have numbers and strings.

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
tsap
Pyrite | Level 9

Ahh ok. Yeah I didn't encounter that with the dummy data because all of the variable names being reviewed were the same length.

So what you need to do is set the length of the Diff_Variable to at minimum the value of the longest variable.

 

 

I cleaned up the logic a little bit and created a new macro variable that contains the length for the longest variable name to use in the PROC SQL query below it.

 

%MACRO VARIABLEDIFFS(LIBNAME,MEMNAME);
	PROC SQL noprint; 
		select nvar					INTO :TOTALVAR		from dictionary.tables	where libname="&LIBNAME."	and memname="&MEMNAME.";
		select name					INTO :Variable1-	from dictionary.COLUMNS	where libname="&LIBNAME."	and memname="&MEMNAME.";
		select MAX(LENGTH(name))	INTO :MaxVarLength	from dictionary.COLUMNS	where libname="&LIBNAME."	and memname="&MEMNAME.";
	QUIT;

	%DO n=2 %TO &TOTALVAR;
		PROC SQL;
		CREATE TABLE WORK.WANT_&n.	AS
			SELECT DISTINCT a.&Variable1., "&&Variable&n."	AS Diff_Variable LENGTH=&MaxVarLength.
			FROM 		WORK.HAVE	AS a
			INNER JOIN	WORK.HAVE	AS b	ON a.&Variable1.=b.&Variable1.
			WHERE a.&&Variable&n. NE b.&&Variable&n.;
		QUIT;

		PROC APPEND BASE=WORK.WANT_ALL
					DATA=WORK.WANT_&N.;
		RUN; 
		PROC DELETE DATA=WORK.WANT_&N.;
	%END;
%MEND VARIABLEDIFFS;
%VARIABLEDIFFS(WORK,HAVE);

View solution in original post

7 REPLIES 7
emilezola
Calcite | Level 5

I tried this, but it only seems to be comparing variables between them. I am looking for something that can help me compare the same columns between two rows so I can see which columns are different.

tsap
Pyrite | Level 9

I threw together a dummy table and a macro statement to identify observations that have a duplicate id but a varying value in a different variable. The dummy table only has 6 variables total, but is built to accommodate tables with any number of variables.

 

Here is the logic:

DATA WORK.HAVE;
FORMAT 	 ID	$3. Var2 $16. 	Var3 $16. 	Var4 8. 	Var5 $16. 	Var6 $16.;
INFORMAT ID	$3. Var2 $16. 	Var3 $16. 	Var4 8. 	Var5 $16. 	Var6 $16.;
INPUT    ID		VAR2 		VAR3 		VAR4 		VAR5 		VAR6;
INFILE DATALINES DLM='|' DSD;
DATALINES;
123|Red|Blue|100|Yellow|Black
123|Red|Black|100|Yellow|Black
234|Orange|Yellow|80|White|Purple
234|Orange|Yellow|65|White|Purple
345|Yellow|White|90|Green|Brown
456|Green|Brown|25|Red|Orange
;



%MACRO VARIABLEDIFFS(LIBNAME,MEMNAME);
	PROC SQL noprint; 
		select nvar	INTO :TOTALVAR
		from dictionary.tables
		where libname="&LIBNAME."
		and memname="&MEMNAME.";
	QUIT;
	%PUT &=TOTALVAR.;


	proc sql noprint; 
		select name		INTO :Variable1-
		from dictionary.COLUMNS
		where libname="&LIBNAME."
		and memname="&MEMNAME."
		ORDER BY VARNUM; 
	QUIT;

	%DO n=2 %TO &TOTALVAR;
		PROC SQL;
		CREATE TABLE WORK.WANT_&n.	AS
			SELECT DISTINCT a.&Variable1., "&&Variable&n."	AS Diff_Variable
			FROM 		WORK.HAVE	AS a
			INNER JOIN	WORK.HAVE	AS b	ON a.&Variable1.=b.&Variable1.
			WHERE a.&&Variable&n. NE b.&&Variable&n.;
		QUIT;

		PROC APPEND DATA=WORK.WANT_&N.
					BASE=WORK.WANT_ALL;
		RUN; 
		PROC DELETE DATA=WORK.WANT_&N.;
	%END;
%MEND VARIABLEDIFFS;
%VARIABLEDIFFS(WORK,HAVE);

The result in the end, on the table WORK.WANT_ALL is:

ID	Diff_Variable
123 Var3 234 Var4

Showing that the observations with ID '123' vary on the variable 'Var3' and ID '234' varies on the variable 'Var4'.

Looking at the dummy data that was loaded for those ids:

 

For ID '123':

123|Red|Blue|100|Yellow|Black 
123|Red|Black|100|Yellow|Black

We can confirm that Var3 is different between the two observations. The first observation has 'Blue' for its value of Var3 and the next observation has the value of 'Black'.

 

When calling the macro statement, you include the libname and the table name. In my example it was 'WORK' and 'HAVE'.

 

I hope this helps

emilezola
Calcite | Level 5

Thanks for the answer ! I'm having some trouble executing your code. I did create a table named have into my work library with only duplicate data but I'm getting this error :

 

 

WARNING: Variable Diff_Variable has different lengths on BASE and DATA files (BASE 10 DATA 16).
ERROR: No appending done because of anomalies listed above. Use FORCE option to append these files.

 

How do I reformat Diff_Variable in this case?

tsap
Pyrite | Level 9

Ahh ok. Yeah I didn't encounter that with the dummy data because all of the variable names being reviewed were the same length.

So what you need to do is set the length of the Diff_Variable to at minimum the value of the longest variable.

 

 

I cleaned up the logic a little bit and created a new macro variable that contains the length for the longest variable name to use in the PROC SQL query below it.

 

%MACRO VARIABLEDIFFS(LIBNAME,MEMNAME);
	PROC SQL noprint; 
		select nvar					INTO :TOTALVAR		from dictionary.tables	where libname="&LIBNAME."	and memname="&MEMNAME.";
		select name					INTO :Variable1-	from dictionary.COLUMNS	where libname="&LIBNAME."	and memname="&MEMNAME.";
		select MAX(LENGTH(name))	INTO :MaxVarLength	from dictionary.COLUMNS	where libname="&LIBNAME."	and memname="&MEMNAME.";
	QUIT;

	%DO n=2 %TO &TOTALVAR;
		PROC SQL;
		CREATE TABLE WORK.WANT_&n.	AS
			SELECT DISTINCT a.&Variable1., "&&Variable&n."	AS Diff_Variable LENGTH=&MaxVarLength.
			FROM 		WORK.HAVE	AS a
			INNER JOIN	WORK.HAVE	AS b	ON a.&Variable1.=b.&Variable1.
			WHERE a.&&Variable&n. NE b.&&Variable&n.;
		QUIT;

		PROC APPEND BASE=WORK.WANT_ALL
					DATA=WORK.WANT_&N.;
		RUN; 
		PROC DELETE DATA=WORK.WANT_&N.;
	%END;
%MEND VARIABLEDIFFS;
%VARIABLEDIFFS(WORK,HAVE);
emilezola
Calcite | Level 5

Works just fine ! Thank you very much !

tsap
Pyrite | Level 9
You're welcome.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1097 views
  • 0 likes
  • 3 in conversation