BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NKormanik
Barite | Level 11

Please look over the following:

 

Indicator  Range                      Number
i_20304    >= 1.64214 AND < 2.03922     14
i_20304    >= 1.64214 AND < 2.43629     19
i_20304    >= 1.64214 AND < 2.83337     11
i_20304    >= 1.64214 AND < 3.23044     17
i_20304    >= 1.64214 AND < 3.62751     13

The Range refers to the particular Indicator, and Number of those.

 

Data set contains over 5 million rows of such.  Different Indicators, different Ranges, different Numbers of each.

 

I'm trying to summarize all this data.

 

Proc Freq works great as far as Indicator and Number.

 

But incorporating Range (a categorical variable) into the summary is a mystery.

 

One possibility would be to concatenate Indicator and Range.  Then use Proc Freq.

 

Any thoughts on this are greatly appreciated.  Particularly any other procedure that can handle "ranges" of data.

 

Nicholas Kormanik

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

1 Split your range variable into 4 variables, upper_bound, lower_bound, upper_inequality, lower_inequality. 

Then your range of 

>= 1.64214 AND < 2.03922 

becomes

 

upper_bound = 2.03992

lower_bound = 1.64214

upper_inequality = LT

lower_inequality = GE

 

Then you can use that to graph the ranges as a band or area in SGPLOT. 


@NKormanik wrote:

You may need 4 variables if you need to capture the < or <= as well.

Willing to add as many more variables as it takes.

What do you have in mind?

 

 

 


 

View solution in original post

11 REPLIES 11
mkeintz
PROC Star

What is the summary you want supposed to look like?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
NKormanik
Barite | Level 11

What is the summary you want supposed to look like?

How about ANY summary anyone can think of, for starters?

 

 

 

Reeza
Super User

Split your "range" into an upper and lower value so that you can summarize it in a different manner. You may need 4 variables if you need to capture the < or <= as well. 

NKormanik
Barite | Level 11

You may need 4 variables if you need to capture the < or <= as well.

Willing to add as many more variables as it takes.

What do you have in mind?

 

 

 

Reeza
Super User

1 Split your range variable into 4 variables, upper_bound, lower_bound, upper_inequality, lower_inequality. 

Then your range of 

>= 1.64214 AND < 2.03922 

becomes

 

upper_bound = 2.03992

lower_bound = 1.64214

upper_inequality = LT

lower_inequality = GE

 

Then you can use that to graph the ranges as a band or area in SGPLOT. 


@NKormanik wrote:

You may need 4 variables if you need to capture the < or <= as well.

Willing to add as many more variables as it takes.

What do you have in mind?

 

 

 


 

NKormanik
Barite | Level 11

I know we can graphically plot POINTS.  Nicely.

 

Is it possible in SAS to plot RANGES?

 

You can do that in Mathematica.

 

A graph would make a terrific summary....

 

Particularly, in the present case, if there's a little number next to each Range showing the total number of such ranges for that Indicator.

 

Like, for instance:

 

(54,321)

i_20304    >= 1.64214 AND < 2.03922

 

Remember, there are over 5 million such lines in the data set.  So, the above says there are 54,321 cases of this particular Indicator, and this particular Range.

 

 

ChrisNZ
Tourmaline | Level 20

> But incorporating Range (a categorical variable) into the summary is a mystery.

Why? What's different from the categorical variable  INDICATOR?

NKormanik
Barite | Level 11

Both are categorical variables, yes.  I haven't yet tried using Range in Proc Freq, thinking that it's so untypical, weird.  That there is probably some other way of handling it.

 

ChrisNZ
Tourmaline | Level 20
I have no idea what issue you want to solve.
NKormanik
Barite | Level 11

@ChrisNZ wrote:
I have no idea what issue you want to solve.

No worries, Chris.  Maybe next time.

 

What I'm attempting to do is to SUMMARIZE over 5 million rows of data, as given in part above.

 

Simply paging down that much data is nearly impossible to get a true sense of.  Somehow it all has to be summarized.  A set of graphs?  Using Proc Freq?  Some other....

 

ballardw
Super User

@NKormanik wrote:

@ChrisNZ wrote:
I have no idea what issue you want to solve.

No worries, Chris.  Maybe next time.

 

What I'm attempting to do is to SUMMARIZE over 5 million rows of data, as given in part above.

 

Simply paging down that much data is nearly impossible to get a true sense of.  Somehow it all has to be summarized.  A set of graphs?  Using Proc Freq?  Some other....

 


Hint: Provide a small example of the data and what the "summarized" version looks like.

Make sure the example data matches the rules you posted.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1771 views
  • 6 likes
  • 5 in conversation