BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
disosa
Calcite | Level 5

Hello,

 

I hope this message finds you well. I would like to request assistance, if possible, with counting the number of common elements in different columns in SAS. I'm working with a dataset with ~120 observations, but will be using a simpler example to try to troubleshoot. I'm using SAS 9.4.

 

I was given this dataset with several values in one observation, separated by commas (data from a survey where respondents could select more than one response), as below:

 

Screen Shot 2019-06-20 at 9.02.57 AM.png

 

I was able to use an array and do-loop to separate out the values into different columns, as below:

 

Screen Shot 2019-06-20 at 9.06.12 AM.png

 

My question is, how can I count the number of common elements in these four different columns, to end up with something similar to this result in a table output in SAS:

 

Screen Shot 2019-06-20 at 9.08.16 AM.png

 

When I use proc freq / tables to tabulate the frequencies of my fav_color1, fav_color2, and fav_color3 variables in SAS, I end up with three different tables with the frequencies of the favorite colors for each column, but another way to ask my question is: how can I combine these three tables to get the one table, as above?

 

Thank you so much for your assistance. Please let me know if I can clarify anything.

 

Diana

1 ACCEPTED SOLUTION

Accepted Solutions
tsap
Pyrite | Level 9

I created a dummy table using the data provided, and added an ID column, as a unique value for identifying each survey taker aligning with their answer values.

 

DATA WORK.Have;
FORMAT 		ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INFORMAT	ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INPUT  		ID	    Fave_Colors       Fave_Seasons       Fave_Songs;
INFILE DATALINES DLM=':' DSD;
DATALINES;
12345:2,8:4:2,4,6
23456:1,3,7,9:1,3:1,2
34567:4,9:1,4:6
45678:1,8:2,3:3,7,8
56789:2:1:8,9
;

DATA WORK.WANT (KEEP= ID Fave_Color);
	SET WORK.HAVE;
	ValueCt=COUNTW(Fave_Colors,',');
	DO i=1 TO ValueCt;
		Fave_Color=SCAN(Fave_Colors,i,',');
		OUTPUT;
	END;
RUN;

PROC FREQ DATA=WORK.WANT;		TABLES Fave_Color		/LIST MISSING NOPERCENT NOCUM;	RUN;

 

This will print a frequency that appears the way that you requested. If you require this to be an actual table versus just a printed frequency, just add the "OUT="  option after 'nocum' and fill in the name that you want for the new table with frequency details.

 

The only difference between my frequency and the one in your post is that, yours has rows for color values that were not selected by anyone. So color option '5' and '6'. Since none of the survey takers selected those rows, when you run a proc freq against the realigned data, they will not appear on the output.

 

Hope this helps.

View solution in original post

4 REPLIES 4
tsap
Pyrite | Level 9

I created a dummy table using the data provided, and added an ID column, as a unique value for identifying each survey taker aligning with their answer values.

 

DATA WORK.Have;
FORMAT 		ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INFORMAT	ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INPUT  		ID	    Fave_Colors       Fave_Seasons       Fave_Songs;
INFILE DATALINES DLM=':' DSD;
DATALINES;
12345:2,8:4:2,4,6
23456:1,3,7,9:1,3:1,2
34567:4,9:1,4:6
45678:1,8:2,3:3,7,8
56789:2:1:8,9
;

DATA WORK.WANT (KEEP= ID Fave_Color);
	SET WORK.HAVE;
	ValueCt=COUNTW(Fave_Colors,',');
	DO i=1 TO ValueCt;
		Fave_Color=SCAN(Fave_Colors,i,',');
		OUTPUT;
	END;
RUN;

PROC FREQ DATA=WORK.WANT;		TABLES Fave_Color		/LIST MISSING NOPERCENT NOCUM;	RUN;

 

This will print a frequency that appears the way that you requested. If you require this to be an actual table versus just a printed frequency, just add the "OUT="  option after 'nocum' and fill in the name that you want for the new table with frequency details.

 

The only difference between my frequency and the one in your post is that, yours has rows for color values that were not selected by anyone. So color option '5' and '6'. Since none of the survey takers selected those rows, when you run a proc freq against the realigned data, they will not appear on the output.

 

Hope this helps.

Kc2
Quartz | Level 8 Kc2
Quartz | Level 8

Hi Diana,

 

if you add an id  t your dataset you can do a proc transpose to have all the values in one column.

 

data have;
input fav_color1 fav_color2 fav_color3 fav_color4;
datalines;
2 8 . .
1 3 7 9
4 9 . .
1 8 . .
2 . . .

run;

 

data want ;
set have ;
id=strip(put(_n_,8.));
run;

 

proc transpose data=want out=tr_want(rename=( col1=color));
by id;
var fav_color1 fav_color2 fav_color3 fav_color4 ;
run;

 

proc freq data= tr_want noprint;
tables color/missing out=test;
title "Frequencies of color ";
run;

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @disosa 

 

I think it would be a better idea to transpose the answers to multiple observations and not multiple variables. In this form, data can be used in Proc Freq, and all subsequent transformation of data would be easier too.

 

The following example shows what I mean. Note that you don't get color favorites with no answers, because the program can't count what's not there, and it can't know that 5 or 6 could have been used as answer categories.

 

* Test data;
data have;
	infile datalines truncover;
	informat Survey_ID 8. FavoriteColors $char10.;
	input Survey_ID FavoriteColors;
	datalines;
1 2,8
2 1,3,7,9
3 4,9
4 1,8
5 2
;
run;

* Transpose multiple ansvers in string to multiple observations;
data temp (drop=FavoriteColors i); set have;
	do i = 1 to count(FavoriteColors,',')+1;
		FavoriteColor = scan(FavoriteColors,i,',');
		if FavoriteColor ne '' then output;
	end;
run;

* Count actual answers;
proc sql;
	create table want as
		select distinct
			FavoriteColor,
			count(*) as Frequency
		from temp
		group by FavoriteColor;
quit;
ballardw
Super User

One way for a report:

DATA WORK.Have;
FORMAT 		ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INFORMAT	ID $5.	Fave_Colors $10.  Fave_Seasons $10.  Fave_Songs $10.;
INPUT  		ID	    Fave_Colors       Fave_Seasons       Fave_Songs;
INFILE DATALINES DLM=':' DSD;
DATALINES;
12345:2,8:4:2,4,6
23456:1,3,7,9:1,3:1,2
34567:4,9:1,4:6
45678:1,8:2,3:3,7,8
56789:2:1:8,9
;
proc sort data=work.have;
   by id;
run;

proc transpose data=work.have out=work.trans;
by id;
var fave_colors fave_seasons fave_songs;
run;

DATA WORK.temp ;
	SET WORK.trans;
	ValueCt=COUNTW(col1,',');
	DO i=1 TO ValueCt;
		value=SCAN(col1,i,',');
		OUTPUT;
	END;
RUN;

proc tabulate data=work.temp;
   class _name_ value;
   tables value='',
          _name_=''*n=''
          / box="Value" misstext='0'
   ;
run;

Either a custom format for the _name_ variable or and assignment in work.temp might get nicer  column headings in the final table.

Or separate tables for each original variable.

proc tabulate data=work.temp;
   class _name_ value;
   tables _name_,
          value='',
          n='Frequency'
          / box="Value" misstext='0'
   ;
   label _name_='Topic';
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2894 views
  • 0 likes
  • 5 in conversation