BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tonimitro
Calcite | Level 5

Hello,

I recently just started learning SAS and I am struggling on what to do on this particular task.

 

So, i have 2 data sets. One data set has a list of columns and their values and the other dataset has a list of the column names from data set 1 that i need to find the mean.

 

The data sets look like this. This is just a sample data, but i have hundreds of columns in data set 1 and hundreds of rows in dataset2.

 

data testData1;
infile datalines delimiter=',';
INPUT c1 c2 c3 c4 c5 c6 c7;
DATALINES;
13,5,4,6,78,12,4
65,45,78,78,12,45,12
66,66,65,65,66,88,66
11,10,4,5,6,7,5
11,10,4,5,6,7,20
;

/*contains the list of comma separated column names from dataset1
Example: I need to compute the mean for columns c1, c2, c5 etc
*/

data testData2;
infile datalines delimiter=','; INPUT colNames; DATALINES; c1,c2,c5 c4,c5,c7 c4,c6,c7 c1,c2,c3 ;

 

 

I want to create a third dataset that will compute the mean of each of the rows in Dataset 1 based on each row on Dataset 2.

 

Output should look like this and it should have as many columns as there are rows in Dataset 2 and as many rows as there are rows in dataset 1. In this example 5 rows and 4 columns

 

Each of the rows in testData 2 should be a column in dataset3 and each of the rows should be the mean values from test data 1.

 

So for Row 1 in tesdata1, the mean of c1,c2,c5 is (13+5+78)/3 = 32, row 2 for the same columns is (65+45+12)/3=40.66 etc.

Same thing for the next set of columns c4, c5, c7   (6+78+4)/3 =5.6 for row 1, and for row 2 is (78+78+12)/3=56

Same thing for the next set of columns c4, c6, c7   (6+12+4)/3 =7.3 for row 1 and for row 2 is (78+45+12)/3=45

 

 

 

data testData1;
INPUT c1-c2-c5 c4-c5-c7 c4-c6-c7 c1-c2-c3;
DATALINES;
32   5.6  7.3
40.6 56 45 etc etc etc ;

 

 

I just needs some guidance to get started as there are many things that sas can do. Not sure if this is something easy to do with just a dataset, proc means, or just a macro.

 

I hope someone can help.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

It is very easy for IML code.

 

data testData1;
infile datalines delimiter=','; 
INPUT c1 c2 c3 c4 c5 c6 c7;
DATALINES;
13,5,4,6,78,12,4
65,45,78,78,12,45,12
66,66,65,65,66,88,66
11,10,4,5,6,7,5
11,10,4,5,6,7,20
; 


data testData2;
infile datalines delimiter=',';
INPUT colNames $80.;
c1=scan(colnames,1,',');
c2=scan(colnames,2,',');
c3=scan(colnames,3,',');
drop colnames;
DATALINES;
c1,c2,c5
c4,c5,c7
c4,c6,c7
c1,c2,c3
; 
proc iml;
use testdata1;
read all var _num_ into x[c=vname];
close;

use testdata2;
read all var _char_ into y;
close;

want=j(nrow(x),nrow(y),.);
do i=1 to nrow(y);
 col=y[i,];
 want[,i]=x[,col][,:];
end;

colname=translate(strip(compbl(rowcat(y))),'_',' ');

create want from want[c=colname];
append from want;
close;
quit;

proc print;run;

View solution in original post

4 REPLIES 4
art297
Opal | Level 21

You have five rows of values, but only 4 rows of column names. Which values did you want to use to calculate the mean for the fifth row?

 

Art, CEO, AnalystFinder.com

art297
Opal | Level 21

If you were just missing the last row of column names, here is one way you can do what you want:

 

data testData1;
  infile datalines delimiter=','; 
  INPUT c1-c7;
  DATALINES;
13,5,4,6,78,12,4
65,45,78,78,12,45,12
66,66,65,65,66,88,66
11,10,4,5,6,7,5
11,10,4,5,6,7,20
; 

data testData2;
  infile datalines delimiter=',';
  INPUT colName1-colName3;
  DATALINES;
1,2,5
4,5,7
4,6,7
1,2,3
1,2,3
; 
data want ;*(drop=col:);
  set testData1;
  set testData2;
  array col(*) c1-c7;
  mean=mean(of col(colName1),col(colName2),col(colName3));
run;

Art, CEO, AnalystFinder.com

Ksharp
Super User

It is very easy for IML code.

 

data testData1;
infile datalines delimiter=','; 
INPUT c1 c2 c3 c4 c5 c6 c7;
DATALINES;
13,5,4,6,78,12,4
65,45,78,78,12,45,12
66,66,65,65,66,88,66
11,10,4,5,6,7,5
11,10,4,5,6,7,20
; 


data testData2;
infile datalines delimiter=',';
INPUT colNames $80.;
c1=scan(colnames,1,',');
c2=scan(colnames,2,',');
c3=scan(colnames,3,',');
drop colnames;
DATALINES;
c1,c2,c5
c4,c5,c7
c4,c6,c7
c1,c2,c3
; 
proc iml;
use testdata1;
read all var _num_ into x[c=vname];
close;

use testdata2;
read all var _char_ into y;
close;

want=j(nrow(x),nrow(y),.);
do i=1 to nrow(y);
 col=y[i,];
 want[,i]=x[,col][,:];
end;

colname=translate(strip(compbl(rowcat(y))),'_',' ');

create want from want[c=colname];
append from want;
close;
quit;

proc print;run;
tonimitro
Calcite | Level 5

Thank you Ksharp. That is exaclty what i was looking for.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 5633 views
  • 2 likes
  • 3 in conversation