BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AK100
Pyrite | Level 9

Hello All,

I have a very interesting case in SAS EG which I would like to solve and which might benefit us all to learn from. I have a big dataset in SAS EG which I use to make charts in VA. These charts (most of them) have a specific trend in a time-frame of 1 minute. The beginning of the chart have values around 17-18, within a few seconds, the values drop to approximately 0 - 1. A few seconds later the value rises again to 17-18.  So the trend starts high goes low and ends high. The data-set looks like this (but much much bigger):

 

Time     --    Value

00:000  --   17.109

00:025  --   17.109

00:040  --   17.106

00:061  --   17.034

00:071  --    17.056

00:151  --    0.431

00:173  --    0.432

00:182  --    0.320

00:195  --    0.650

00:234  --   17.109

00:244  --   17.109

00:256  --   17.106

00:278  --   17.034

00:291  --   17.056

 

Sometimes the chart doesn't fit the trend because the dataset sometimes contains to high or to low values at unwanted places. Now I only want to select the data in EG which fits the trend so that I don't get weird charts in SAS VA. The trend I'm looking for actually has 3 sections: Begin - Middle - End. In the Begin the values must be HIGH between 17 and 18. In the Middle the values must be LOW between 0 and 1. In the End the values must again be HIGH between 17 and 18.

If the values are like I mentioned above, I will get a Bathtub curve which is what I want.

 

I was thinking about selecting the average of the highest values (the 17's) and the average of the lowest values (the 0's) and then maybe filter which are the closest to those values? I think this might not be accurate enough. I have also read something about the SHEWHART Procedure (I don't know what this is or how this works). 

 

I appreciate your help. Thank you in advance. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Taking your first dataset (the second has different values), this code removes rows 5, 18 and 30:

data have;
  length Value 8 Location $40 Time 8;
  infile cards dsd dlm=':' truncover;
  input Value Location Minutes Seconds;
  time = minutes * 60 + seconds;
  format time time11.2;
  drop minutes seconds;
cards;
17.000: Germany: 1:45.0
17.001: Germany:  1:45.1
17.220: Germany:  1:45.2
18.901: Germany: 1:45.3
1.091: Germany: 1:45.4
19.021: Germany: 1:45.5
16.023: Germany: 1:45.6
17.504: Germany: 1:45.7
17.600: Germany: 1:45.8
17.760: Germany: 1:45.9
19.759: Germany: 1:45.10
6.000: Germany: 1:45.11
2.454: Germany: 1:45.12
1.665: Germany: 1:45.13
0.435: Germany: 1:45.14
0.091: Germany: 1:45.15
0.654: Germany: 1:45.16
53.324: Germany: 1:45.17
0.554: Germany: 1:45.18
1.670: Germany: 1:45.19
3.050: Germany: 1:45.20
5.659: Germany: 1:45.21
17.540:Germany: 1:45.22
17.67: Germany: 1:45.23
17.546:Germany:  1:45.24
18.434:Germany:  1:45.25
16.655:Germany:  1:45.26
19.315:Germany:  1:45.27
16.323:Germany:  1:45.28
1.545:Germany:  1:45.29
19.323:Germany:  1:45.30
17.234:Germany:  1:45.31
22.215:Germany:  1:45.32
;

data want;
merge
  have
  have (firstobs=2 keep=value rename=(value=nvalue))
;
lvalue = lag(value);
if lvalue ne . and nvalue ne .
then do;
  if
    lvalue <= 6 and nvalue <= 6 and value > 6
  then del = 1;
  else if
    lvalue > 6 and nvalue > 6 and
    (value < 6 or value > (nvalue + lvalue))
  then del = 1;
end;
if not del;
drop nvalue lvalue del;
run;

View solution in original post

19 REPLIES 19
PaigeMiller
Diamond | Level 26

This means for example that the values cannot start at low numbers like 1 for or have middle values around the 10 or end with to low values. 

 

This is a rather vague description of what constitutes the problem you are looking for. Can you be much more specific about what exactly would be the trend you are looking for (or not looking for)?

 

I was thinking about selecting the average of the highest values (the 17's) and the average of the lowest values (the 0's) and then maybe filter which are the closest to those values? I think this might not be accurate enough.

 

Same comments as above.

 

I have also read something about the SHEWHART Procedure (I don't know what this is or how this works). 

 

Shewhart will not find these types of trends that you are discussing. 

--
Paige Miller
AK100
Pyrite | Level 9

The trend I'm looking for actually has 3 sections: Begin - Middle - End
In the Begin the values must be HIGH between 17 and 18. In the Middle the values must be LOW between 0 and 1. In the End the values must again be HIGH between 17 and 18.

If the values are like I mentioned above, I will get a Bathtub curve which is what I want.

AK100
Pyrite | Level 9

Hello All,

I have a very interesting case in SAS EG which I would like to solve and which might benefit us all to learn from. I have a big dataset in SAS EG which I use to make charts in VA. These charts (most of them) have a specific trend in a time-frame of 1 minute. The beginning of the chart have values around 17-18, within a few seconds, the values drop to approximately 0 - 1. A few seconds later the value rises again to 17-18.  So the trend starts high goes low and ends high. The data-set looks like this (but much much bigger):

 

Time     --    Value

00:000  --   17.109

00:025  --   17.109

00:040  --   17.106

00:061  --   17.034

00:071  --    17.056

00:151  --    0.431

00:173  --    0.432

00:182  --    0.320

00:195  --    0.650

00:234  --   17.109

00:244  --   17.109

00:256  --   17.106

00:278  --   17.034

00:291  --   17.056

 

Sometimes the chart doesn't fit the trend because the dataset sometimes contains to high or to low values at unwanted places. Now I only want to select the data in EG which fits the trend so that I don't get weird charts in SAS VA. The trend I'm looking for actually has 3 sections: Begin - Middle - End. In the Begin the values must be HIGH between 17 and 18. In the Middle the values must be LOW between 0 and 1. In the End the values must again be HIGH between 17 and 18.

If the values are like I mentioned above, I will get a Bathtub curve which is what I want.

 

I was thinking about selecting the average of the highest values (the 17's) and the average of the lowest values (the 0's) and then maybe filter which are the closest to those values? I think this might not be accurate enough. I have also read something about the SHEWHART Procedure (I don't know what this is or how this works). 

 

I appreciate your help. Thank you in advance. 

 

Kurt_Bremser
Super User

Since you already asked that identical question, I merged this back into it.

 

Please supply usable example data (IN A DATA STEP WITH DATALINES; DO NOT SKIP THIS!) that illustrates your issue sufficiently, and point out the observations you want deleted, and why.

AK100
Pyrite | Level 9
I don't think a data step will help for this one as the trend starts taking shape after more then thousands of rows. The data set is to big to just show you a small portion of it. So it wont make sense if I just send you 10 or 100 observations in a date step. It also contains a lot of private data which I cannot just throw on the forum.
Kurt_Bremser
Super User

You already showed data, and I can see nothing critical in there. But it does not sufficiently illustrate your issue, so you need to expand it (so we see which values you want deleted), and present the data in a readily usable form, so we can immediately start testing. Help us to help you.

If you're not willing to help us, why should we invest time in helping you?

AK100
Pyrite | Level 9

I know there is nothing critical in there, but can you say the same for millions of rows? I can assure you there are values in those millions who don't fit the trend. I want to filter them out as a precaution.

 

Its not that I don't want to help you, but I cant give give you (private) data which contains millions of rows, and to make 1 trend chart out of that millions rows I need approximately 1500+ rows. That means a data step would only give you usable data if I make a data step of 1500 rows, is that what you want? Do you want me to send you a data step of 1500 rows?

Kurt_Bremser
Super User

Take the lines you already gave us, add a few lines that you would want excluded, and post that. If these 14 lines were enough to show an "OK" example, 20 lines should be enough to create a "not OK" example. Once we have code to solve this, you can try it out on your data, and come back with the special cases that the code did not cover.

 

But right now nobody here has a clue what you want, and that is why you have not gotten a usable answer yet.

PaigeMiller
Diamond | Level 26

@AK100 wrote:

I know there is nothing critical in there, but can you say the same for millions of rows? I can assure you there are values in those millions who don't fit the trend. I want to filter them out as a precaution.

 

Its not that I don't want to help you, but I cant give give you (private) data which contains millions of rows, and to make 1 trend chart out of that millions rows I need approximately 1500+ rows. That means a data step would only give you usable data if I make a data step of 1500 rows, is that what you want? Do you want me to send you a data step of 1500 rows?


We still don't really know what "trend" you want to find. And no, by the way, a data step of 1500 rows would probably not help, as it is one instance of such a trend, and we would need to see both matching trends (plural) and non-matching trends (plural). Maybe a plot showing some matching trends (plural) and some non-matching trends (plural) would move everything in the direction of helping us understand what you want. The plot can have 1500 data points (or more!) in each "trend".

--
Paige Miller
AK100
Pyrite | Level 9

@PaigeMiller @Kurt_Bremser 

Okay guys I have tried to figure out some code for you. If you run the following code you will get (the mini version) of my data-set.

data want1 ;
  length Value Location $40 Time 8 ;
  infile cards dsd dlm=':' truncover ;
  input Value Location Minutes Seconds ;
   time = minutes*60 + seconds ;
  format time time11.2 ;
cards;
17.000: Germany: 1:45.0
17.001: Germany:  1:45.1
17.220: Germany:  1:45.2
18.901: Germany: 1:45.3
1.091: Germany: 1:45.4
19.021: Germany: 1:45.5
16.023: Germany: 1:45.6
17.504: Germany: 1:45.7
17.600: Germany: 1:45.8
17.760: Germany: 1:45.9
19.759: Germany: 1:45.10
6.000: Germany: 1:45.11
2.454: Germany: 1:45.12
1.665: Germany: 1:45.13
0.435: Germany: 1:45.14
0.091: Germany: 1:45.15
0.654: Germany: 1:45.16
53.324: Germany: 1:45.17
0.554: Germany: 1:45.18
1.670: Germany: 1:45.19
3.050: Germany: 1:45.20
5.659: Germany: 1:45.21
17.540:Germany: 1:45.22
17.67: Germany: 1:45.23
17.546:Germany:  1:45.24
18.434:Germany:  1:45.25
16.655:Germany:  1:45.26
19.315:Germany:  1:45.27
16.323:Germany:  1:45.28
1.545:Germany:  1:45.29
19.323:Germany:  1:45.30
17.234:Germany:  1:45.31
22.215:Germany:  1:45.32
;;;;


This is the preferred trend I want to see: The first rows should have a value between 15 - 25, then in the middle rows the values need to drop between 0 and 5. In the last rows the values should rise again between 15 - 25 (High,Low,High). If I have this trend I can make my bathtub curve in VA. 

 

However,  As you can also see in the output. Row 5, 18 and 30 have to high or to low values who don't fit the trend. These need to be out-filtered, so I need to take just the remaining rows. Please note that directly after this bunch of rows a new bunch starts again which will give me a new bathtub curve etc. In total I have more then 50 million rows. I wish all of them could be filtered the same way. 

 

I've tried my best to make things as clear as possible, hope this does okay. Thanks in advance.

PaigeMiller
Diamond | Level 26

If we are going to pick out a trend from data that contains many trends, from a statistical point of view, we need to see examples (plural) of the trends you WANT and examples (plural) of the trends you DON'T WANT.

--
Paige Miller
AK100
Pyrite | Level 9

@PaigeMiller 

 

I don't think you understand what I want. In the code I have sent you there is data I WANT and data I DON'T want. Like I said row 5 ,18 and 30 need to be out-filtered. The rest can stay.  It doesn't make sense to add a trend which fits my criteria as it is already added in the previous code. But as you wish: here is a trend that DOES fit my criteria and which will give me the wished bathtub curve. 

data want1 ;
  length Value Location $40 Time 8 ;
  infile cards dsd dlm=':' truncover ;
  input Value Location Minutes Seconds ;
   time = minutes*60 + seconds ;
  format time time11.2 ;
cards;
19.434: Germany: 2:45.0
16.545: Germany:  2:45.1
17.354: Germany:  2:45.2
17.943: Germany: 2:45.3
17.091: Germany: 2:45.4
17.545: Germany: 2:45.5
16.434: Germany: 2:45.6
17.504: Germany: 2:45.7
17.545: Germany: 2:45.8
17.542: Germany: 2:45.9
19.759: Germany: 2:45.10
3.000: Germany: 2:45.11
2.434: Germany: 2:45.12
1.545: Germany: 2:45.13
0.322: Germany: 2:45.14
0.654: Germany: 2:45.15
0.537: Germany: 2:45.16
53.786: Germany: 2:45.17
0.434: Germany: 2:45.18
17.646: Germany: 2:45.19
3.434: Germany: 2:45.20
5.659: Germany: 2:45.21
17.540:Germany: 2:45.22
20.434: Germany: 2:45.23
24.433:Germany:  2:45.24
16.545:Germany:  2:45.25
16.656:Germany:  2:45.26
19.315:Germany:  2:45.27
16.323:Germany:  2:45.28
16.545:Germany:  2:45.29
19.544:Germany:  2:45.30
19.234:Germany:  2:45.31
22.655:Germany:  2:45.32
;;;;

 

Kurt_Bremser
Super User

Do you have a maximum value for the "low" part of the cycle that is the same for all trends, or are those dynamic?

(say, all "lows" for "Germany" are below 6, but for "UK" they are below 25)

Likewise for the "high" part, can one define a fixed minimum value?

 

Edit: and is there an overall maximum value?

AK100
Pyrite | Level 9
No All lows must be below 6 for Germany as wel as the other locations. The same applies for the High parts. There is not really an overall maximum.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 19 replies
  • 3228 views
  • 2 likes
  • 3 in conversation