BookmarkSubscribeRSS Feed
sdetlefs
Fluorite | Level 6

Hello,

 

I am looking for assistance in creating code for a new table in SAS. I want to do a spearman correlation for scores of two different variables that are listed in the same column.

 

Group

Variable

Score

1

Teamwork

3

2

Teamwork

4

3

Teamwork

5

1

Communication

4

2

Communication

3

3

Communication

5

 

So I was thinking I need to make a new table like this:

 

Group

Teamwork

Communication

1

3

3

2

4

4

3

5

5

1

4

4

2

3

3

3

5

5

 

Is that correct? How else can I separate these when they are listed in the same column…?

 

TIA

11 REPLIES 11
jimbarbour
Meteorite | Level 14

Well, as far as I can tell, you only have 3 scores for Teamwork and 3 scores for Communication.  Therefore you should only have three rows in your final output, yes?  Like the below:

jimbarbour_0-1627192070877.png

 

Here's some sample code, below, that will work generically with no matter how many values of Variable you have.  Right now, you have "Communication" and "Teamwork".  If, say, you added a couple more, for example "Innovation" and "Accuracy", this could should still work.

DATA	Have;
	LENGTH	Group		$1
			Variable	$13
			Score		$1
			;
	INFILE	DATALINES	DSD	DLM='09'X	MISSOVER;
	INPUT	Group		$
			Variable	$
			Score		$
			;
DATALINES;
1	Teamwork	3
2	Teamwork	4
3	Teamwork	5
1	Communication	4
2	Communication	3
3	Communication	5
;
RUN;

PROC	SQL	;
	SELECT	COUNT	(DISTINCT	Variable)
		INTO	:Nbr_of_Vars
		FROM	Have
		;
	SELECT	DISTINCT	Variable
		INTO	:Var1 - 
		FROM	Have
		;
QUIT;

PROC	SORT	DATA=Have;
	BY	Group;
RUN;

DATA	Want;
	DROP	Variable	Score;
	SET	Have;
		BY	Group;

	%MACRO	Generate_Var_Code;
		%LOCAL	i;
		%DO	i		=	1	%TO	&Nbr_of_Vars;
			RETAIN	&&Var&i;
		%END;
		%DO	i		=	1	%TO	&Nbr_of_Vars;
			IF	Variable	=	"&&Var&i"	THEN
				&&Var&i		=	Score;
			ELSE
		%END;
			PUTLOG	"ERROR:  Invalid Var name "  Variable=;
	%MEND	Generate_Var_Code;
	%Generate_Var_Code;

	IF	LAST.Group				THEN
		OUTPUT;
RUN;

Jim

sdetlefs
Fluorite | Level 6
Hi Jim

Thank you for your help. It is unfortunately more complicated. The variable column has 96 different options, two of which are teamwork and communication. The scores are from 1-5 and there are many different teams that are evaluated. So in total there are over 43,000 rows. A sample of these data and the excel file is below:

 

DateRoomGroupYearobemergmed_teamvariablegroup_id $Scorebad_actionDayexclude_medexclude_teamexclude_perf
11/13/201411201411701A41956100010
11/13/201421201411702A41956300000
11/13/201431201411703A41956200000
11/13/201441201411704A41956210000
11/13/201451201411705A41956300000

 

So I want to create columns of the group ID with scores (1-5) for each component from variable (96 total). Is that possible?

 

Thanks!

Sarah

jimbarbour
Meteorite | Level 14
@sdetlefs,

The code is intended to work for 1 to n values of Variable. The logic is written such that the program will determine the number of variables and create them.

I'm out right now, so it's hard for me to look at the data you posted on my little screen. If you have time, perhaps you could try running the code and posting the results if it works or the log if it doesn't.

Jim
jimbarbour
Meteorite | Level 14
Oh, wait. Nevermind. I just scrolled right, and I see that the data is more complex than I thought.

Can you post a little mock up of what you want the results to look like?

Jim
sdetlefs
Fluorite | Level 6

Hi Jim,

 

I was hoping to compare the scores from two different variables to evaluate whether there is a correlation. I figured I would need to create columns for each variable right?

 

If obemerg=1;

Group_ID

Teamwork score

Communication score

Variable 3……97

1A41956

3

1

1

1A41957

4

2

2

1A41958

5

5

5

1A41959

2

4

4

1A41960

5

3

3

1A41961

1

5

5

 

As of right now, the scores for teamwork and communication and every variable are all in the same column...

 

Thank you,

 

Sarah

jimbarbour
Meteorite | Level 14

@sdetlefs wrote:

I was hoping to compare the scores from two different variables to evaluate whether there is a correlation. I figured I would need to create columns for each variable right?

Hi, Sarah,

 

Yes, you would need to create a column for each value of Variable.  This code should work for that if all you need is a) a column for each value of Variable and b) the score.  If you need to include additional columns, those would have to be added.

 

However, there is one little problem.  The column "Variable" in your sample data has a numeric value, "70".  SAS generally doesn't want variables to start with a number.  You could try name literals, but I think it might be better to prefix each numeric value with a "G" (or some letter) so that valid SAS V7 column names can be created.  The "G" can be stripped off before creating a report if need be.

 

And do you want this by Group or Group_ID?  Your last post makes me think that you want this by Group_ID not Group.

 

Jim

sdetlefs
Fluorite | Level 6

Hi everyone,

 

I ended up transposing all of the data in SAS (hours of work) and my database is now working. I am sure that there is a way in SAS but this ended up being easier in the short run. Thank you Jim for telling me that the variables shouldn't be numeric, I would have struggled to figure this out for a long time otherwise.

 

Thank you for your assistance,

Sarah

Tom
Super User Tom
Super User

PROC TRANSPOSE should work fine for your case.  If your naming variable has values that don't start with letters just use the PREFIX= option.

proc transpose data=have out=want prefix=var ;
  by <grouping variables>;
  id <naming variable(s)>;
  var <value variable>;
run;

It should not take very long to run.

PaigeMiller
Diamond | Level 26

Do you mean you want this?

 

Group      Teamwork     Communication
1                 3                 4
2                 4                 3
3                 5                 5

If so, here is code that should work (untested). If that's not what you want, much more explanation is needed

 

data want;
    merge have(where=(variable='Teamwork') rename=(score=teamwork)) 
        have(where=(variable='Communication') rename=(score=communication));
    by group;
run;
--
Paige Miller
sdetlefs
Fluorite | Level 6

Hi Paige,

 

Unfortunately this did not work. I think that is what I am trying to do, though. At the end it stated that despite the code variables teamwork and total were not found.

 

I posted a sample of the database to explain the complexity and what I am trying to do in previous reply.

 

Thank you,

 

Sarah

Kurt_Bremser
Super User

Just sort and transpose:

data have;
infile datalines dlm=" " dsd truncover;
input group $ variable :$20. score;
datalines;
1 Teamwork 3
2 Teamwork 4
3 Teamwork 5
1 Communication 4
2 Communication 3
3 Communication 5
;

proc sort data=have;
by group;
run;

proc transpose
  data=have
  out=want (drop=_name_)
;
by group;
id variable;
var score;
run;

If your issue is more complex than this, post your more complicated example data in similar form (data step with datalines), and show us the expected result.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1384 views
  • 4 likes
  • 5 in conversation