BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jkurka
Fluorite | Level 6

Hello,

I am trying to develop a code that find the maximum value in a range of 1800 rows and subsequently condenses the data down to that one row, and then move on to the next 1800 and do the same.

 

Overall I have about 30 data sets, each of which has over 10 million rows, so am trying to use PROC SQL or PROC EXPAND as opposed to data step processing.

 

Can someone provide some help?

Thanks!

- jkurka

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

PROC SUMMARY has features that make this easy.  First you can use a view to add the group variable to your data, virtually speaking.  The IDGROUP OUTPUT statement option finds the MAX of age for each BY G(roup) created in the VIEW and the OUT sub-option defines the variable(s) to write to the data set that correspond to the MAX(AGE).  The OBS sub-option creates the _OBS_ variable with the observation number.

 

%let gsize=3;

data rowsv / view=rowsv;
   set sashelp.class;
   g + mod(_n_,&gsize) eq 1;
   run;
proc summary data=rowsv;
   by g;
   output out=max&gsize idgroup(max(age) obs out(name--weight)=);
   run;

Capture.PNG

View solution in original post

10 REPLIES 10
Reeza
Super User

Is there some variable that identifies the 1800 rows? 

jkurka
Fluorite | Level 6

Using PROC SQL and the monotonic () function I created a 'rows' variable, but that is all besides the variable (column) that has the data I want to aggregate.

jkurka
Fluorite | Level 6

I could provide an example for visualization. I'm sure that would help!

Say I'm starting with this and want to find the max within a range of each 3 variables:

rows         var

1              0

2              2

3              4

4              1

5              4

6              2

7              0

8              5

9              1

 

I would like my output to show:

row          var

1              4    --> max of first set of 3

4              4    --> max of next set of 3

7              5    --> max of next set 3

SASKiwi
PROC Star

It's simple in a DATA step:

 

data want;
  keep row maximum;
  set have;
  retain maximum 0;
  maximum = max(maximum, var);
  if mod(_n_, 1800) = 0 then do;
    row = _n_;
    output; 
    maximum = 0;
  end;
run;

Note though that the data is read in the order it is stored. If the stored order changes then so do the maximums.

jkurka
Fluorite | Level 6

Won't this take a really long time to go through 10 million rows?

data_null__
Jade | Level 19

The actual time it takes depends on many factors. It should take about the same for a procedure to read the data as data step.  Ideally you want to make one pass of the data.

data_null__
Jade | Level 19

PROC SUMMARY has features that make this easy.  First you can use a view to add the group variable to your data, virtually speaking.  The IDGROUP OUTPUT statement option finds the MAX of age for each BY G(roup) created in the VIEW and the OUT sub-option defines the variable(s) to write to the data set that correspond to the MAX(AGE).  The OBS sub-option creates the _OBS_ variable with the observation number.

 

%let gsize=3;

data rowsv / view=rowsv;
   set sashelp.class;
   g + mod(_n_,&gsize) eq 1;
   run;
proc summary data=rowsv;
   by g;
   output out=max&gsize idgroup(max(age) obs out(name--weight)=);
   run;

Capture.PNG

jkurka
Fluorite | Level 6

Thank you data_null__. Yes I'm keeping track of both posts. Wasn't aware of the duplicate until now. I'm giving your solution a try now!

data_null__
Jade | Level 19

It would be a good idea to show an example of your data.   If you have date-time and you want to group accordingly it might be better to use the date-time and a format or other rounding technique instead of observations number.

jkurka
Fluorite | Level 6

The data aren't timestamped. I've attached an excel example of a chuck of one file. As you can see it's simple, and just wanting the maximum value of every 1800 rows of the flt_z variable (last column).

 

I have the time that corresponds to the first row, but won't timestamp the full dataset due to it taking so long.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 2284 views
  • 0 likes
  • 4 in conversation