Solved: How to count the number of common elements in different columns in SAS

disosa · Posted 06-20-2019 09:13 AM

Hello,

I hope this message finds you well. I would like to request assistance, if possible, with counting the number of common elements in different columns in SAS. I'm working with a dataset with ~120 observations, but will be using a simpler example to try to troubleshoot. I'm using SAS 9.4.

I was given this dataset with several values in one observation, separated by commas (data from a survey where respondents could select more than one response), as below:

I was able to use an array and do-loop to separate out the values into different columns, as below:

My question is, how can I count the number of common elements in these four different columns, to end up with something similar to this result in a table output in SAS:

When I use proc freq / tables to tabulate the frequencies of my fav_color1, fav_color2, and fav_color3 variables in SAS, I end up with three different tables with the frequencies of the favorite colors for each column, but another way to ask my question is: how can I combine these three tables to get the one table, as above?

Thank you so much for your assistance. Please let me know if I can clarify anything.

Diana

tsap · Posted 06-20-2019 09:56 AM

I created a dummy table using the data provided, and added an ID column, as a unique value for identifying each survey taker aligning with their answer values.

DATA WORK.Have;
FORMAT 		ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INFORMAT	ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INPUT  		ID	    Fave_Colors       Fave_Seasons       Fave_Songs;
INFILE DATALINES DLM=':' DSD;
DATALINES;
12345:2,8:4:2,4,6
23456:1,3,7,9:1,3:1,2
34567:4,9:1,4:6
45678:1,8:2,3:3,7,8
56789:2:1:8,9
;

DATA WORK.WANT (KEEP= ID Fave_Color);
	SET WORK.HAVE;
	ValueCt=COUNTW(Fave_Colors,',');
	DO i=1 TO ValueCt;
		Fave_Color=SCAN(Fave_Colors,i,',');
		OUTPUT;
	END;
RUN;

PROC FREQ DATA=WORK.WANT;		TABLES Fave_Color		/LIST MISSING NOPERCENT NOCUM;	RUN;

This will print a frequency that appears the way that you requested. If you require this to be an actual table versus just a printed frequency, just add the "OUT=" option after 'nocum' and fill in the name that you want for the new table with frequency details.

The only difference between my frequency and the one in your post is that, yours has rows for color values that were not selected by anyone. So color option '5' and '6'. Since none of the survey takers selected those rows, when you run a proc freq against the realigned data, they will not appear on the output.

Hope this helps.

View solution in original post

tsap · Posted 06-20-2019 09:56 AM

I created a dummy table using the data provided, and added an ID column, as a unique value for identifying each survey taker aligning with their answer values.

DATA WORK.Have;
FORMAT 		ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INFORMAT	ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INPUT  		ID	    Fave_Colors       Fave_Seasons       Fave_Songs;
INFILE DATALINES DLM=':' DSD;
DATALINES;
12345:2,8:4:2,4,6
23456:1,3,7,9:1,3:1,2
34567:4,9:1,4:6
45678:1,8:2,3:3,7,8
56789:2:1:8,9
;

DATA WORK.WANT (KEEP= ID Fave_Color);
	SET WORK.HAVE;
	ValueCt=COUNTW(Fave_Colors,',');
	DO i=1 TO ValueCt;
		Fave_Color=SCAN(Fave_Colors,i,',');
		OUTPUT;
	END;
RUN;

PROC FREQ DATA=WORK.WANT;		TABLES Fave_Color		/LIST MISSING NOPERCENT NOCUM;	RUN;

This will print a frequency that appears the way that you requested. If you require this to be an actual table versus just a printed frequency, just add the "OUT=" option after 'nocum' and fill in the name that you want for the new table with frequency details.

The only difference between my frequency and the one in your post is that, yours has rows for color values that were not selected by anyone. So color option '5' and '6'. Since none of the survey takers selected those rows, when you run a proc freq against the realigned data, they will not appear on the output.

Hope this helps.

Kc2 · Posted 06-20-2019 10:10 AM

Hi Diana,

if you add an id t your dataset you can do a proc transpose to have all the values in one column.

data have;
input fav_color1 fav_color2 fav_color3 fav_color4;
datalines;
2 8 . .
1 3 7 9
4 9 . .
1 8 . .
2 . . .

run;

data want ;
set have ;
id=strip(put(_n_,8.));
run;

proc transpose data=want out=tr_want(rename=( col1=color));
by id;
var fav_color1 fav_color2 fav_color3 fav_color4 ;
run;

proc freq data= tr_want noprint;
tables color/missing out=test;
title "Frequencies of color ";
run;

ErikLund_Jensen · Posted 06-20-2019 10:16 AM

Hi @disosa

I think it would be a better idea to transpose the answers to multiple observations and not multiple variables. In this form, data can be used in Proc Freq, and all subsequent transformation of data would be easier too.

The following example shows what I mean. Note that you don't get color favorites with no answers, because the program can't count what's not there, and it can't know that 5 or 6 could have been used as answer categories.

* Test data;
data have;
	infile datalines truncover;
	informat Survey_ID 8. FavoriteColors $char10.;
	input Survey_ID FavoriteColors;
	datalines;
1 2,8
2 1,3,7,9
3 4,9
4 1,8
5 2
;
run;

* Transpose multiple ansvers in string to multiple observations;
data temp (drop=FavoriteColors i); set have;
	do i = 1 to count(FavoriteColors,',')+1;
		FavoriteColor = scan(FavoriteColors,i,',');
		if FavoriteColor ne '' then output;
	end;
run;

* Count actual answers;
proc sql;
	create table want as
		select distinct
			FavoriteColor,
			count(*) as Frequency
		from temp
		group by FavoriteColor;
quit;

ballardw · Posted 06-20-2019 06:13 PM

One way for a report:

DATA WORK.Have;
FORMAT 		ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INFORMAT	ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INPUT  		ID	    Fave_Colors       Fave_Seasons       Fave_Songs;
INFILE DATALINES DLM=':' DSD;
DATALINES;
12345:2,8:4:2,4,6
23456:1,3,7,9:1,3:1,2
34567:4,9:1,4:6
45678:1,8:2,3:3,7,8
56789:2:1:8,9
;
proc sort data=work.have;
   by id;
run;

proc transpose data=work.have out=work.trans;
by id;
var fave_colors fave_seasons fave_songs;
run;

DATA WORK.temp ;
	SET WORK.trans;
	ValueCt=COUNTW(col1,',');
	DO i=1 TO ValueCt;
		value=SCAN(col1,i,',');
		OUTPUT;
	END;
RUN;

proc tabulate data=work.temp;
   class _name_ value;
   tables value='',
          _name_=''*n=''
          / box="Value" misstext='0'
   ;
run;

Either a custom format for the _name_ variable or and assignment in work.temp might get nicer column headings in the final table.

Or separate tables for each original variable.

proc tabulate data=work.temp;
   class _name_ value;
   tables _name_,
          value='',
          n='Frequency'
          / box="Value" misstext='0'
   ;
   label _name_='Topic';
run;

How to count the number of common elements in different columns in SAS

Re: How to count the number of common elements in different columns in SAS

Re: How to count the number of common elements in different columns in SAS

Re: How to count the number of common elements in different columns in SAS

Re: How to count the number of common elements in different columns in SAS

Re: How to count the number of common elements in different columns in SAS

How to count the number of common elements in different columns in SAS

Re: How to count the number of common elements in different columns in SAS

Re: How to count the number of common elements in different columns in SAS

Re: How to count the number of common elements in different columns in SAS

Re: How to count the number of common elements in different columns in SAS

Re: How to count the number of common elements in different columns in SAS

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!