BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
St_Rokas
Fluorite | Level 6

Hello,

I am trying to summarize and average data in a large table with up to 2.5 million rows and 120 columns. The structure is like this: a single CASE is divided into thousands of REGIONS, each of which can have a RANK (usually ranging from -30 to 30 including zero) and more than a hundred measured different FEATURES in that region:

Screenshot 2023-05-14 062716.png

 

I need to convert this data into two tables. One of them lists down unique CASES with all possible RANK values as colmns and has the count of REGIONS with such a rank in a cell, like this:

Screenshot 2023-05-14 064325.png

 

The second table should be a summary list: mean and standard deviation of each FEATURE per RANK present in a CASE, like this:

Screenshot 2023-05-14 065211.png

 

Of course the two tables can be combined by simply adding a column RANK_COUNT in the second table.

Screenshot 2023-05-14 071715.png

 

I would appreciate your help,

Thank you,

Rokas

1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Onyx | Level 15

I think the most "standard way" would be to use the following approach:

data have;
  call streaminit(42);
  do Case="A","C","E","G","I","K","M","O","Q","S","U","W","Y";
    do Region = 1 to rand("integer",200, 300);
      Rank = rand("integer",-5, 5);
      array f feature_01-feature_20;
      do over f;
        f=rand("normal");
      end;
      output;
    end;
  end;
run;


proc sql;
  create table r as
    select C as Case, Rank, count(region) as rank_count
    from have(rename=(case=c))
    group by C, Rank
    order by C, Rank
  ;
quit;


proc summary data=have NWAY;
  class Case Rank;
  var feature_:;
  output  out=m_sd(drop=_:) MEAN(feature_:)= STD(feature_:)= / AUTONAME; 
run;

proc transpose data=m_sd(obs=0) out=rename(keep=_name_);
  var feature_:; 
run;

filename f TEMP;
data _null_;
file f;

put 'data Combined; merge r m_sd; by Case Rank; rename';

do until(EOF);
  set rename end=EOF;
  _name2_ = catx("_",scan(_NAME_,3,"_"),scan(_NAME_,1,"_"),scan(_NAME_,2,"_"));
  put _NAME_ "=" _NAME2_;
end;

put ';run;';
stop; 
run;
%include f /source2;

[EDIT:] the only thing is that variables with mean and std will be called: feature_XX_mean and feature_XX_stdDev. If it is a big problem let me know.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

6 REPLIES 6
yabwon
Onyx | Level 15

Hi and welcome to SAS-communities! 

 

If you want to get help faster provide some example data in usable form, not as pictures... the best as a data step with cards e.g.,  

data have;
input x a b $ c d $;
cards;
1 11 A 100 ABC
2 22 B 200 DEF
3 33 C 300 GHI
4 44 D 400 JKL
;
run;

Read about it here: https://communities.sas.com/t5/help/faqpage/faq-category-id/posting/title/use_syntax_highlighter

 

Bart

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



St_Rokas
Fluorite | Level 6
Thank you for the tip, Bart - next time I will most certainly do that.
yabwon
Onyx | Level 15

I think the most "standard way" would be to use the following approach:

data have;
  call streaminit(42);
  do Case="A","C","E","G","I","K","M","O","Q","S","U","W","Y";
    do Region = 1 to rand("integer",200, 300);
      Rank = rand("integer",-5, 5);
      array f feature_01-feature_20;
      do over f;
        f=rand("normal");
      end;
      output;
    end;
  end;
run;


proc sql;
  create table r as
    select C as Case, Rank, count(region) as rank_count
    from have(rename=(case=c))
    group by C, Rank
    order by C, Rank
  ;
quit;


proc summary data=have NWAY;
  class Case Rank;
  var feature_:;
  output  out=m_sd(drop=_:) MEAN(feature_:)= STD(feature_:)= / AUTONAME; 
run;

proc transpose data=m_sd(obs=0) out=rename(keep=_name_);
  var feature_:; 
run;

filename f TEMP;
data _null_;
file f;

put 'data Combined; merge r m_sd; by Case Rank; rename';

do until(EOF);
  set rename end=EOF;
  _name2_ = catx("_",scan(_NAME_,3,"_"),scan(_NAME_,1,"_"),scan(_NAME_,2,"_"));
  put _NAME_ "=" _NAME2_;
end;

put ';run;';
stop; 
run;
%include f /source2;

[EDIT:] the only thing is that variables with mean and std will be called: feature_XX_mean and feature_XX_stdDev. If it is a big problem let me know.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



PaigeMiller
Diamond | Level 26

Hello @St_Rokas 

 

If, in the last table, you want to "combine" the first and second table so that Rank_Count is vertical, don't create a table (as in your first table) such that the counts are horizontal. You are simply making your work more difficult. Create counts so that they are vertical.

 

PROC SUMMARY will create the final table, without first having to create the first two tables. As you have not provided data in a usable form (a "usable form" means working SAS data step code as shown by @yabwon, and not any other method) the following code is UNTESTED.

 

proc summary data=have nway;
    class case region rank;
    var feature_01 feature_02 feature_03;
    output out=want n(feature_01)=rank_count mean=f1_mean f2_mean f3_mean
        stddev=f1_std f2_std f3_std;
run;

 

--
Paige Miller
St_Rokas
Fluorite | Level 6
Thank you for taking time to help!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 823 views
  • 3 likes
  • 3 in conversation