Hello,
I am trying to summarize and average data in a large table with up to 2.5 million rows and 120 columns. The structure is like this: a single CASE is divided into thousands of REGIONS, each of which can have a RANK (usually ranging from -30 to 30 including zero) and more than a hundred measured different FEATURES in that region:
I need to convert this data into two tables. One of them lists down unique CASES with all possible RANK values as colmns and has the count of REGIONS with such a rank in a cell, like this:
The second table should be a summary list: mean and standard deviation of each FEATURE per RANK present in a CASE, like this:
Of course the two tables can be combined by simply adding a column RANK_COUNT in the second table.
I would appreciate your help,
Thank you,
Rokas
I think the most "standard way" would be to use the following approach:
data have;
call streaminit(42);
do Case="A","C","E","G","I","K","M","O","Q","S","U","W","Y";
do Region = 1 to rand("integer",200, 300);
Rank = rand("integer",-5, 5);
array f feature_01-feature_20;
do over f;
f=rand("normal");
end;
output;
end;
end;
run;
proc sql;
create table r as
select C as Case, Rank, count(region) as rank_count
from have(rename=(case=c))
group by C, Rank
order by C, Rank
;
quit;
proc summary data=have NWAY;
class Case Rank;
var feature_:;
output out=m_sd(drop=_:) MEAN(feature_:)= STD(feature_:)= / AUTONAME;
run;
proc transpose data=m_sd(obs=0) out=rename(keep=_name_);
var feature_:;
run;
filename f TEMP;
data _null_;
file f;
put 'data Combined; merge r m_sd; by Case Rank; rename';
do until(EOF);
set rename end=EOF;
_name2_ = catx("_",scan(_NAME_,3,"_"),scan(_NAME_,1,"_"),scan(_NAME_,2,"_"));
put _NAME_ "=" _NAME2_;
end;
put ';run;';
stop;
run;
%include f /source2;
[EDIT:] the only thing is that variables with mean and std will be called: feature_XX_mean and feature_XX_stdDev. If it is a big problem let me know.
Bart
Hi and welcome to SAS-communities!
If you want to get help faster provide some example data in usable form, not as pictures... the best as a data step with cards e.g.,
data have;
input x a b $ c d $;
cards;
1 11 A 100 ABC
2 22 B 200 DEF
3 33 C 300 GHI
4 44 D 400 JKL
;
run;
Read about it here: https://communities.sas.com/t5/help/faqpage/faq-category-id/posting/title/use_syntax_highlighter
Bart
I think the most "standard way" would be to use the following approach:
data have;
call streaminit(42);
do Case="A","C","E","G","I","K","M","O","Q","S","U","W","Y";
do Region = 1 to rand("integer",200, 300);
Rank = rand("integer",-5, 5);
array f feature_01-feature_20;
do over f;
f=rand("normal");
end;
output;
end;
end;
run;
proc sql;
create table r as
select C as Case, Rank, count(region) as rank_count
from have(rename=(case=c))
group by C, Rank
order by C, Rank
;
quit;
proc summary data=have NWAY;
class Case Rank;
var feature_:;
output out=m_sd(drop=_:) MEAN(feature_:)= STD(feature_:)= / AUTONAME;
run;
proc transpose data=m_sd(obs=0) out=rename(keep=_name_);
var feature_:;
run;
filename f TEMP;
data _null_;
file f;
put 'data Combined; merge r m_sd; by Case Rank; rename';
do until(EOF);
set rename end=EOF;
_name2_ = catx("_",scan(_NAME_,3,"_"),scan(_NAME_,1,"_"),scan(_NAME_,2,"_"));
put _NAME_ "=" _NAME2_;
end;
put ';run;';
stop;
run;
%include f /source2;
[EDIT:] the only thing is that variables with mean and std will be called: feature_XX_mean and feature_XX_stdDev. If it is a big problem let me know.
Bart
Hello @St_Rokas
If, in the last table, you want to "combine" the first and second table so that Rank_Count is vertical, don't create a table (as in your first table) such that the counts are horizontal. You are simply making your work more difficult. Create counts so that they are vertical.
PROC SUMMARY will create the final table, without first having to create the first two tables. As you have not provided data in a usable form (a "usable form" means working SAS data step code as shown by @yabwon, and not any other method) the following code is UNTESTED.
proc summary data=have nway;
class case region rank;
var feature_01 feature_02 feature_03;
output out=want n(feature_01)=rank_count mean=f1_mean f2_mean f3_mean
stddev=f1_std f2_std f3_std;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.