SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

How to find a max value while restricting number of rows

Accepted Solution Solved
Reply
Contributor
Posts: 25
Accepted Solution

How to find a max value while restricting number of rows

Hello,

I am trying to develop a code that find the maximum value in a range of 1800 rows and subsequently condenses the data down to that one row, and then move on to the next 1800 and do the same.

 

Overall I have about 30 data sets, each of which has over 10 million rows, so am trying to use PROC SQL or PROC EXPAND as opposed to data step processing.

 

Can someone provide some help?

Thanks!

- jkurka


Accepted Solutions
Solution
‎02-12-2016 10:21 AM
Respected Advisor
Posts: 3,799

Re: How to find a max value while restricting number of rows

PROC SUMMARY has features that make this easy.  First you can use a view to add the group variable to your data, virtually speaking.  The IDGROUP OUTPUT statement option finds the MAX of age for each BY G(roup) created in the VIEW and the OUT sub-option defines the variable(s) to write to the data set that correspond to the MAX(AGE).  The OBS sub-option creates the _OBS_ variable with the observation number.

 

%let gsize=3;

data rowsv / view=rowsv;
   set sashelp.class;
   g + mod(_n_,&gsize) eq 1;
   run;
proc summary data=rowsv;
   by g;
   output out=max&gsize idgroup(max(age) obs out(name--weight)=);
   run;

Capture.PNG

View solution in original post


All Replies
Super User
Posts: 19,862

Re: How to find a max value while restricting number of rows

Is there some variable that identifies the 1800 rows? 

Contributor
Posts: 25

Re: How to find a max value while restricting number of rows

Using PROC SQL and the monotonic () function I created a 'rows' variable, but that is all besides the variable (column) that has the data I want to aggregate.

Contributor
Posts: 25

Re: How to find a max value while restricting number of rows

I could provide an example for visualization. I'm sure that would help!

Say I'm starting with this and want to find the max within a range of each 3 variables:

rows         var

1              0

2              2

3              4

4              1

5              4

6              2

7              0

8              5

9              1

 

I would like my output to show:

row          var

1              4    --> max of first set of 3

4              4    --> max of next set of 3

7              5    --> max of next set 3

Super User
Posts: 3,260

Re: How to find a max value while restricting number of rows

It's simple in a DATA step:

 

data want;
  keep row maximum;
  set have;
  retain maximum 0;
  maximum = max(maximum, var);
  if mod(_n_, 1800) = 0 then do;
    row = _n_;
    output; 
    maximum = 0;
  end;
run;

Note though that the data is read in the order it is stored. If the stored order changes then so do the maximums.

Contributor
Posts: 25

Re: How to find a max value while restricting number of rows

Won't this take a really long time to go through 10 million rows?

Respected Advisor
Posts: 3,799

Re: How to find a max value while restricting number of rows

The actual time it takes depends on many factors. It should take about the same for a procedure to read the data as data step.  Ideally you want to make one pass of the data.

Solution
‎02-12-2016 10:21 AM
Respected Advisor
Posts: 3,799

Re: How to find a max value while restricting number of rows

PROC SUMMARY has features that make this easy.  First you can use a view to add the group variable to your data, virtually speaking.  The IDGROUP OUTPUT statement option finds the MAX of age for each BY G(roup) created in the VIEW and the OUT sub-option defines the variable(s) to write to the data set that correspond to the MAX(AGE).  The OBS sub-option creates the _OBS_ variable with the observation number.

 

%let gsize=3;

data rowsv / view=rowsv;
   set sashelp.class;
   g + mod(_n_,&gsize) eq 1;
   run;
proc summary data=rowsv;
   by g;
   output out=max&gsize idgroup(max(age) obs out(name--weight)=);
   run;

Capture.PNG

Contributor
Posts: 25

Re: How to find a max value while restricting number of rows

Posted in reply to data_null__

Thank you data_null__. Yes I'm keeping track of both posts. Wasn't aware of the duplicate until now. I'm giving your solution a try now!

Respected Advisor
Posts: 3,799

Re: How to find a max value while restricting number of rows

It would be a good idea to show an example of your data.   If you have date-time and you want to group accordingly it might be better to use the date-time and a format or other rounding technique instead of observations number.

Contributor
Posts: 25

Re: How to find a max value while restricting number of rows

Posted in reply to data_null__

The data aren't timestamped. I've attached an excel example of a chuck of one file. As you can see it's simple, and just wanting the maximum value of every 1800 rows of the flt_z variable (last column).

 

I have the time that corresponds to the first row, but won't timestamp the full dataset due to it taking so long.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 635 views
  • 0 likes
  • 4 in conversation