BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tonimitro
Calcite | Level 5

Hello,

I recently just started learning SAS and I am struggling on what to do on this particular task.

 

So, i have 2 data sets. One data set has a list of columns and their values and the other dataset has a list of the column names from data set 1 that i need to find the mean.

 

The data sets look like this. This is just a sample data, but i have hundreds of columns in data set 1 and hundreds of rows in dataset2.

 

data testData1;
infile datalines delimiter=',';
INPUT c1 c2 c3 c4 c5 c6 c7;
DATALINES;
13,5,4,6,78,12,4
65,45,78,78,12,45,12
66,66,65,65,66,88,66
11,10,4,5,6,7,5
11,10,4,5,6,7,20
;

/*contains the list of comma separated column names from dataset1
Example: I need to compute the mean for columns c1, c2, c5 etc
*/

data testData2;
infile datalines delimiter=','; INPUT colNames; DATALINES; c1,c2,c5 c4,c5,c7 c4,c6,c7 c1,c2,c3 ;

 

 

I want to create a third dataset that will compute the mean of each of the rows in Dataset 1 based on each row on Dataset 2.

 

Output should look like this and it should have as many columns as there are rows in Dataset 2 and as many rows as there are rows in dataset 1. In this example 5 rows and 4 columns

 

Each of the rows in testData 2 should be a column in dataset3 and each of the rows should be the mean values from test data 1.

 

So for Row 1 in tesdata1, the mean of c1,c2,c5 is (13+5+78)/3 = 32, row 2 for the same columns is (65+45+12)/3=40.66 etc.

Same thing for the next set of columns c4, c5, c7   (6+78+4)/3 =5.6 for row 1, and for row 2 is (78+78+12)/3=56

Same thing for the next set of columns c4, c6, c7   (6+12+4)/3 =7.3 for row 1 and for row 2 is (78+45+12)/3=45

 

 

 

data testData1;
INPUT c1-c2-c5 c4-c5-c7 c4-c6-c7 c1-c2-c3;
DATALINES;
32   5.6  7.3
40.6 56 45 etc etc etc ;

 

 

I just needs some guidance to get started as there are many things that sas can do. Not sure if this is something easy to do with just a dataset, proc means, or just a macro.

 

I hope someone can help.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

It is very easy for IML code.

 

data testData1;
infile datalines delimiter=','; 
INPUT c1 c2 c3 c4 c5 c6 c7;
DATALINES;
13,5,4,6,78,12,4
65,45,78,78,12,45,12
66,66,65,65,66,88,66
11,10,4,5,6,7,5
11,10,4,5,6,7,20
; 


data testData2;
infile datalines delimiter=',';
INPUT colNames $80.;
c1=scan(colnames,1,',');
c2=scan(colnames,2,',');
c3=scan(colnames,3,',');
drop colnames;
DATALINES;
c1,c2,c5
c4,c5,c7
c4,c6,c7
c1,c2,c3
; 
proc iml;
use testdata1;
read all var _num_ into x[c=vname];
close;

use testdata2;
read all var _char_ into y;
close;

want=j(nrow(x),nrow(y),.);
do i=1 to nrow(y);
 col=y[i,];
 want[,i]=x[,col][,:];
end;

colname=translate(strip(compbl(rowcat(y))),'_',' ');

create want from want[c=colname];
append from want;
close;
quit;

proc print;run;

View solution in original post

4 REPLIES 4
art297
Opal | Level 21

You have five rows of values, but only 4 rows of column names. Which values did you want to use to calculate the mean for the fifth row?

 

Art, CEO, AnalystFinder.com

art297
Opal | Level 21

If you were just missing the last row of column names, here is one way you can do what you want:

 

data testData1;
  infile datalines delimiter=','; 
  INPUT c1-c7;
  DATALINES;
13,5,4,6,78,12,4
65,45,78,78,12,45,12
66,66,65,65,66,88,66
11,10,4,5,6,7,5
11,10,4,5,6,7,20
; 

data testData2;
  infile datalines delimiter=',';
  INPUT colName1-colName3;
  DATALINES;
1,2,5
4,5,7
4,6,7
1,2,3
1,2,3
; 
data want ;*(drop=col:);
  set testData1;
  set testData2;
  array col(*) c1-c7;
  mean=mean(of col(colName1),col(colName2),col(colName3));
run;

Art, CEO, AnalystFinder.com

Ksharp
Super User

It is very easy for IML code.

 

data testData1;
infile datalines delimiter=','; 
INPUT c1 c2 c3 c4 c5 c6 c7;
DATALINES;
13,5,4,6,78,12,4
65,45,78,78,12,45,12
66,66,65,65,66,88,66
11,10,4,5,6,7,5
11,10,4,5,6,7,20
; 


data testData2;
infile datalines delimiter=',';
INPUT colNames $80.;
c1=scan(colnames,1,',');
c2=scan(colnames,2,',');
c3=scan(colnames,3,',');
drop colnames;
DATALINES;
c1,c2,c5
c4,c5,c7
c4,c6,c7
c1,c2,c3
; 
proc iml;
use testdata1;
read all var _num_ into x[c=vname];
close;

use testdata2;
read all var _char_ into y;
close;

want=j(nrow(x),nrow(y),.);
do i=1 to nrow(y);
 col=y[i,];
 want[,i]=x[,col][,:];
end;

colname=translate(strip(compbl(rowcat(y))),'_',' ');

create want from want[c=colname];
append from want;
close;
quit;

proc print;run;
tonimitro
Calcite | Level 5

Thank you Ksharp. That is exaclty what i was looking for.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 5294 views
  • 2 likes
  • 3 in conversation