BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
St_Rokas
Fluorite | Level 6

Hello,

I am trying to summarize and average data in a large table with up to 2.5 million rows and 120 columns. The structure is like this: a single CASE is divided into thousands of REGIONS, each of which can have a RANK (usually ranging from -30 to 30 including zero) and more than a hundred measured different FEATURES in that region:

Screenshot 2023-05-14 062716.png

 

I need to convert this data into two tables. One of them lists down unique CASES with all possible RANK values as colmns and has the count of REGIONS with such a rank in a cell, like this:

Screenshot 2023-05-14 064325.png

 

The second table should be a summary list: mean and standard deviation of each FEATURE per RANK present in a CASE, like this:

Screenshot 2023-05-14 065211.png

 

Of course the two tables can be combined by simply adding a column RANK_COUNT in the second table.

Screenshot 2023-05-14 071715.png

 

I would appreciate your help,

Thank you,

Rokas

1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Amethyst | Level 16

I think the most "standard way" would be to use the following approach:

data have;
  call streaminit(42);
  do Case="A","C","E","G","I","K","M","O","Q","S","U","W","Y";
    do Region = 1 to rand("integer",200, 300);
      Rank = rand("integer",-5, 5);
      array f feature_01-feature_20;
      do over f;
        f=rand("normal");
      end;
      output;
    end;
  end;
run;


proc sql;
  create table r as
    select C as Case, Rank, count(region) as rank_count
    from have(rename=(case=c))
    group by C, Rank
    order by C, Rank
  ;
quit;


proc summary data=have NWAY;
  class Case Rank;
  var feature_:;
  output  out=m_sd(drop=_:) MEAN(feature_:)= STD(feature_:)= / AUTONAME; 
run;

proc transpose data=m_sd(obs=0) out=rename(keep=_name_);
  var feature_:; 
run;

filename f TEMP;
data _null_;
file f;

put 'data Combined; merge r m_sd; by Case Rank; rename';

do until(EOF);
  set rename end=EOF;
  _name2_ = catx("_",scan(_NAME_,3,"_"),scan(_NAME_,1,"_"),scan(_NAME_,2,"_"));
  put _NAME_ "=" _NAME2_;
end;

put ';run;';
stop; 
run;
%include f /source2;

[EDIT:] the only thing is that variables with mean and std will be called: feature_XX_mean and feature_XX_stdDev. If it is a big problem let me know.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

6 REPLIES 6
yabwon
Amethyst | Level 16

Hi and welcome to SAS-communities! 

 

If you want to get help faster provide some example data in usable form, not as pictures... the best as a data step with cards e.g.,  

data have;
input x a b $ c d $;
cards;
1 11 A 100 ABC
2 22 B 200 DEF
3 33 C 300 GHI
4 44 D 400 JKL
;
run;

Read about it here: https://communities.sas.com/t5/help/faqpage/faq-category-id/posting/title/use_syntax_highlighter

 

Bart

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



St_Rokas
Fluorite | Level 6
Thank you for the tip, Bart - next time I will most certainly do that.
yabwon
Amethyst | Level 16

I think the most "standard way" would be to use the following approach:

data have;
  call streaminit(42);
  do Case="A","C","E","G","I","K","M","O","Q","S","U","W","Y";
    do Region = 1 to rand("integer",200, 300);
      Rank = rand("integer",-5, 5);
      array f feature_01-feature_20;
      do over f;
        f=rand("normal");
      end;
      output;
    end;
  end;
run;


proc sql;
  create table r as
    select C as Case, Rank, count(region) as rank_count
    from have(rename=(case=c))
    group by C, Rank
    order by C, Rank
  ;
quit;


proc summary data=have NWAY;
  class Case Rank;
  var feature_:;
  output  out=m_sd(drop=_:) MEAN(feature_:)= STD(feature_:)= / AUTONAME; 
run;

proc transpose data=m_sd(obs=0) out=rename(keep=_name_);
  var feature_:; 
run;

filename f TEMP;
data _null_;
file f;

put 'data Combined; merge r m_sd; by Case Rank; rename';

do until(EOF);
  set rename end=EOF;
  _name2_ = catx("_",scan(_NAME_,3,"_"),scan(_NAME_,1,"_"),scan(_NAME_,2,"_"));
  put _NAME_ "=" _NAME2_;
end;

put ';run;';
stop; 
run;
%include f /source2;

[EDIT:] the only thing is that variables with mean and std will be called: feature_XX_mean and feature_XX_stdDev. If it is a big problem let me know.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



PaigeMiller
Diamond | Level 26

Hello @St_Rokas 

 

If, in the last table, you want to "combine" the first and second table so that Rank_Count is vertical, don't create a table (as in your first table) such that the counts are horizontal. You are simply making your work more difficult. Create counts so that they are vertical.

 

PROC SUMMARY will create the final table, without first having to create the first two tables. As you have not provided data in a usable form (a "usable form" means working SAS data step code as shown by @yabwon, and not any other method) the following code is UNTESTED.

 

proc summary data=have nway;
    class case region rank;
    var feature_01 feature_02 feature_03;
    output out=want n(feature_01)=rank_count mean=f1_mean f2_mean f3_mean
        stddev=f1_std f2_std f3_std;
run;

 

--
Paige Miller
St_Rokas
Fluorite | Level 6
Thank you for taking time to help!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1796 views
  • 3 likes
  • 3 in conversation