BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Emma8
Quartz | Level 8
Hi. I have a large data with many variables. I would like to find out how many variables have the same values (all names are different), for example, but if missing info, then ignore). Also, the real data have texts strings, dates, and numeric variables)



Var1 var2 var3 var4 var5 var 6
1 a 4 6 a1
9 l 6 g ‘’ 5
8 d 5 y d
7 & 7 y ‘’ 6

In this case the output should be var2 and var5 (second and last rows have missing info but ignored).

Thank you.
1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @Emma8 

 

You can use PROC COMPARE to achieve this, along with the NOMISS option.

- It can only compare pairs of variables with the same type. To select them, you can run the code twice, first with 'num' in PROC SQL, and then with 'char' as the type.

- A PROC SQL and a MACRO are used to store all comparisons pairwise in macrovariables. Typically, 

&var =    var1 var1 var1 var1 var2 var2 var2 var4 var4 var5

&with =   var2 var4 var5 var6 var4 var5 var6 var5 var6 var6

 

Output:

Capture d’écran 2020-06-20 à 17.50.54.png

 

Best,

 

data have;
	infile datalines dlm=" "  dsd missover;
	input Var1 $ var2 $  var3 $ var4 $ var5 $ var6 $;
	datalines;
1 a 4 6 a 1
9 l 6 g l 5
8 d 5 y d
7 & 7 y  6
;

proc sql noprint;
	select name
	into:var1 -
	from dictionary.columns
	where libname = 'WORK' and memname = 'HAVE' and type = "char"; /*put either num or char*/
	
	select count(name)
	into:nb_var
	from dictionary.columns
	where libname = 'WORK' and memname = 'HAVE' and type = "char"; /*put either num or char*/
quit;

%global var with; 
%let var=;
%let with=;
%macro pairs;
	%local i j;
	%do i = 1 %to &nb_var;
		%do j = %eval(&i+1) %to &nb_var;
			%let var=&var &&var&i;
			%let with=&with &&var&j;
		%end;
    %end;
%mend pairs;
%pairs;

/* Report */
proc compare base=have listequalvar novalues nomiss;
	var &var;
	with &with;
run;

 NB: you can leverage the following material to go one step further: https://support.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/1654-2018.pdf

 

View solution in original post

19 REPLIES 19
ed_sas_member
Meteorite | Level 14

Hi @Emma8 

 

You can use PROC COMPARE to achieve this, along with the NOMISS option.

- It can only compare pairs of variables with the same type. To select them, you can run the code twice, first with 'num' in PROC SQL, and then with 'char' as the type.

- A PROC SQL and a MACRO are used to store all comparisons pairwise in macrovariables. Typically, 

&var =    var1 var1 var1 var1 var2 var2 var2 var4 var4 var5

&with =   var2 var4 var5 var6 var4 var5 var6 var5 var6 var6

 

Output:

Capture d’écran 2020-06-20 à 17.50.54.png

 

Best,

 

data have;
	infile datalines dlm=" "  dsd missover;
	input Var1 $ var2 $  var3 $ var4 $ var5 $ var6 $;
	datalines;
1 a 4 6 a 1
9 l 6 g l 5
8 d 5 y d
7 & 7 y  6
;

proc sql noprint;
	select name
	into:var1 -
	from dictionary.columns
	where libname = 'WORK' and memname = 'HAVE' and type = "char"; /*put either num or char*/
	
	select count(name)
	into:nb_var
	from dictionary.columns
	where libname = 'WORK' and memname = 'HAVE' and type = "char"; /*put either num or char*/
quit;

%global var with; 
%let var=;
%let with=;
%macro pairs;
	%local i j;
	%do i = 1 %to &nb_var;
		%do j = %eval(&i+1) %to &nb_var;
			%let var=&var &&var&i;
			%let with=&with &&var&j;
		%end;
    %end;
%mend pairs;
%pairs;

/* Report */
proc compare base=have listequalvar novalues nomiss;
	var &var;
	with &with;
run;

 NB: you can leverage the following material to go one step further: https://support.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/1654-2018.pdf

 

Emma8
Quartz | Level 8
I got an warning message -text character over 60000 and truncated. I wonder how can do all about 500 variables with start prefix t_ against all variables over 100 variables with s_? All variables with prefix t_ by batch (5 variables at once with all the 100 variables s_)?
Shmuel
Garnet | Level 18

@Emma8, after hard work and discussion made in the forum, next code uses call sortC with vname fuction

to bring wanted results. 

The code is tested:

data have;
	infile datalines dlm=" "  dsd missover;
	input Var1 $ var2 $  var3 $ var4 $ var5 $ var6 $;
datalines;;
1 a 4 6 a 1
9 l 6 g l 5
8 d 5 y d  
7 & 7 y   6
;;;
run;

data _null_;
 set have;
     array _c {*} _character_;
     array _n {*} _numeric_ ;
     call symput('nvarC', left(dim(_c)));
     call symput('nvarN', left(dim(_n)));
run;
%put nvarC = &nvarC;
%put nvarN = &nvarN;

data want(drop=i );	
 set have;
     dlm = '10'x;   /* any none expected character in values may be used as delimiter */
     fmt = '$char2.'; /* force keep leading spaces */
     array _BEFORE  {*} _character_;
     nvarc = dim(_BEFORE);    
     array _NAMES {&nvarC} $10. _vnm1 - _vnm&nvarC;
     array _temp  {&nvarC} $20. _tmp1 - _tmp&nvarC;
 
     do i=1 to &nvarC;
        _names(i) = vname(_BEFORE(i));
        _temp(i) = catx(dlm, putc(_BEFORE(i), fmt), i );
     end;

	call sortC(of _temp(*));  
	
	do i=1 to dim(_temp)-1;
	   j=i+1;
	   value1 = scan(_temp(i),1, dlm);
	   value2 = scan(_temp(j),1, dlm);
	   if value1 = value2 then do;
	      ix1 = scan(_temp(i),2, dlm);
	      v1  = _names(ix1);
	      ix2 = scan(_temp(j),2, dlm);
	      v2  = _names(ix2);
	      put _N_= v1= v2= value1= value2= ;
	   end;
	end;
run;

output in log shows:

_N_=1 v1=Var1 v2=var6 value1=1 value2=1
_N_=1 v1=var2 v2=var5 value1=a value2=a
_N_=2 v1=var2 v2=var5 value1=l value2=l
_N_=3 v1=var2 v2=var5 value1=d value2=d
_N_=4 v1=Var1 v2=var3 value1=7 value2=7
 
In case you are interested in the forum discussion, it is in the next link:
 
The code may be adapted to any required output format.
Emma8
Quartz | Level 8
Sorry I got an error message that array subscript out of range
Shmuel
Garnet | Level 18

Would you please post the full log.

I shall check and repair where need.

Emma8
Quartz | Level 8
Nvarc=253
Nvarn=415

Also I changed dlm=“|”
( I have commas and open quotes in my text strings that I do not want to create separate variables from what I have—no | in my string)


Then it gives array subscript out of range at line 18565 column 17
Emma8
Quartz | Level 8
Sorry I changed the length format and seems running now!:) thank you
Emma8
Quartz | Level 8
Sorry it does not work —I removed variables with Length more than 900 characters.
Now I have
240 character and 415 numeric variables.
Still out of range array error message.
Could you fix the code without using dlm and force leading spaces ? I mean I used complb to compress all extra spaces
Emma8
Quartz | Level 8
I think it gives an error (array out of range) if variables length were longer than certain limit (perhaps 20).
Shmuel
Garnet | Level 18

@Emma8 , are you comparing the char type variables only or both numeric and character?

 

I'll be glad to help you fix the code but I need see the full log:

I need see all adaptations done to the code, warning and error messages, 

especially messages concerning "out of range at line ##### column ##" - because

line in log is not the same line in code and it changes per run. Usually such message

displays all variables values.

 

copy the full log into the </> icon-window  and post it.

Emma8
Quartz | Level 8
Thank you.
I am comparing numeric only with numeric etc.
it gives an error at _n_=7—when I remove some variables then it gives an error at different, for example, _n_=139
Shmuel
Garnet | Level 18

Without seeing the log I can only guess.

 

1) Pay attention that macro variable &nvarN is the number of all numeric variables in your data.

    When you omit some variables - depending on which line code or which data step - you may cause

    the index i point over the end of the array, thus resulting in "... array subscript out of range at line ..." error.

 

2) When you get such error message, the given line number points to the log line where the error occurred.

    It is important to check the code and the do loop in the program, containing the same code as in the log.

 

3) Sometimes such error may be the result of overriding the index i inside the loop.

 

If you want me to continue help with that code, supply the full log in the next post.

If you prefer to struggle the code by yourself, I wish you  success and appreciate it vary much.

 

BTW, if you compare only numeric variables, why have you post a test sample of character variables ?!

Emma8
Quartz | Level 8
Thanks.
It runs without any warning -but _n_=7 or others it gives the error message —that is the only error message and stops running.
I want to compare numeric data with numeric and character data with character variables
smantha
Lapis Lazuli | Level 10
data have;
	infile datalines dlm=" "  dsd missover;
	input Var1 $ var2 $  var3 $ var4 $ var5 $ var6 $;
	length similar_char_vars similar_num_vars $32767.;
	array _c{*} _character_;
	array _n{*} _numeric_;
	
	do i = 1 to dim(_c);
	similar_char_vars='';
	do j= i+1 to dim(_c);
	   if _c[i] = _c[j] and not missing(_c[i])then similar_char_vars = strip(similar_char_vars)||' '||vname(_c[i])||' '||vname(_c[j])||' '||strip(_c[j]);
	end;
	if ^missing(similar_char_vars) then output;
	end;

	do i = 1 to dim(_c);
	similar_num_vars='';
	do j= i+1 to dim(_c);
	   if _c[i] = _c[j] and not missing(_c[i])then similar_num_vars = strip(similar_num_vars)||' '||vname(_c[i])||' '||vname(_c[j])||' '||strip(_c[j]);
	end;
	if ^missing(similar_char_vars) then output;
	end;
	drop i j;
	put similar_char_vars=;
	datalines;;
1 a 4 6 a 1
9 l 6 g l 5
8 d 5 y d  
7 & 7 y   6
;;;
run;


Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 1945 views
  • 4 likes
  • 4 in conversation