BookmarkSubscribeRSS Feed
user445
Calcite | Level 5

I am using proc means to create a count summary of my data. I am interested in values of the count >= 5 and the code below works for that. But I need to have everything <5 = 0 and I'm not sure how to do that. Proc means doesn't accept if _ then_ statements. Ideally the statement would be if count <5 then count = 0. 

 

proc means data=data noprint;

by condition session subject;

where (count>=5);

var count;

output out=count_summary (count)=total;

 

Thank you for your help!

 

9 REPLIES 9
mkeintz
PROC Star

What does the expression

 (count)=total;

supposed to mean in the statement

output out=count_summary (count)=total;

?

 

 

It is an invalid statement in sas.

 

Could you provide a correct statement showing your current working code?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26

I am interested in values of the count >= 5 and the code below works for that. But I need to have everything <5 = 0 and I'm not sure how to do that.

 

If I read this literally, your code now produces an analysis for records with COUNT>=5 and you want to modify the analysis so if COUNT<5 then it should be treated as zero. Is that what you want?

 

What analysis are you doing? Mean? Sum? Number of records? Kurtosis? I'm confused just like @mkeintz, I can't tell from your code.

--
Paige Miller
user445
Calcite | Level 5

Thank you for the responses, apologies for the confusion. I am interested in the sum of the values equal to or above 5. Anything less than 5 I still need to include in my analysis for power reasons, so I need it to equal 0. I could do this by making a data set using an if_ then_ statement, but for unrelated reasons I need to have it in this step. 

 

Hopefully that makes more sense. Here is a better way of naming the variables that I think will be more helpful:

 

proc means data=data noprint;

by condition session subject;

where (width>=5);

var width;

output out=count_summary (width)=total;

 

Thank you!

ballardw
Super User

It might help to show us some small example of the data, maybe 15 or 20 rows or such, and show us a manually created (you count things) example of the desired output.

user445
Calcite | Level 5

Each subject has 1 condition, multiple sessions, and multiple widths per session. I am attempting to look at widths greater than 5, but need to include all data points, so the width =3 and width = 1 would need to be equal to 0.

 

Sample data:

Condition, Session, Subject, Width

1       1       a       3       

1       1       a       6

1       2       a       5

1       1       b       7

1       1       b       8

1       2       b       1

1       2       b       5

PaigeMiller
Diamond | Level 26

@user445 wrote:

I am interested in the sum of the values equal to or above 5. Anything less than 5 I still need to include in my analysis for power reasons, so I need it to equal 0.


Got it. You want the sum, which is easy enough to get. Please note: this method will not allow you to compute a mean (even though the sum is the numerator of the mean) because it will not give you the proper denominator. I have no idea what "power reasons" means, which means you still haven't really explained to complete problem you are trying to solve. With this code, the original data set remains intact and unchanged so you can analyze it for "power reasons".

 

proc means data=data nway;
    where width>=5;
    class condition session subject;
    var width;
    output out=want sum=sum_width;
run;

I still have the uncomfortable feeling that if we knew the full analysis you want to do, that the above code could be easily modified to get you there, or perhaps an entirely different path could be suggested.

--
Paige Miller
user445
Calcite | Level 5

@PaigeMiller Sorry I'm new to SAS and I guess I didn't give the complete information needed. I am trying to find the total observations or frequency the width is above a given width. For this instance, I need to know the total observations a subject has per session at or above 5. I will then do a mixed effects ANOVA. That is where the power issue comes in. I have 1000 total observations condition*session*subject and when I set the width higher to say 20, I only have 90 observations. That is why I need to keep all the observations and have width = 0. Or some other way of keeping the widths below the threshold in the data set without excluding them.

 

Thank you for the code suggestion. I was able to get that to work, but when I ran it I still got the same number of observations (870) which is the number of condition*session*subject that had a width >=5.

 

Is there any other information I can provide?

PaigeMiller
Diamond | Level 26

I am trying to find the total observations or frequency the width is above a given width.

So you said very clearly earlier you wanted a sum, but now you don't want a sum.

 

I need to know the total observations a subject has per session at or above 5.

This can be computed by PROC SUMMARY, PROC MEANS or PROC FREQ.

 

I will then do a mixed effects ANOVA.

Please specify what are the predictor variables and what are the response variables, and where is the above-mentioned number of total number of observations at or above 5 used.

 

That is where the power issue comes in. I have 1000 total observations condition*session*subject and when I set the width higher to say 20, I only have 90 observations. That is why I need to keep all the observations and have width = 0. Or some other way of keeping the widths below the threshold in the data set without excluding them.

Power to test what hypothesis?

 

You are going to use power to determine if the threshold to keep a value above X or set the value to zero? You are going to use observations where we deliberately set the value to zero in some power calculation?

 

As far as I know, power can be used to determine the number of data points you need to achieve certain conditions. As far as I know, power cannot be used to determine a "tuning parameter" in an analysis, such as the threshold above which you leave the data alone and below which the data is zero.

 

As you can see, I'm very confused, and there are many items in your description that aren't making sense to me. As you can see, the counting or summing of observations is relatively simple to code, but what happens next isn't clear at all.

--
Paige Miller
user445
Calcite | Level 5

My apologies for misspeaking earlier. In the proc means step I am calculating the number of widths above a threshold for a given subject in a given session. My proc mixed code for the mixed effects ANOVA works perfectly. All I need is a way to keep anything less than the threshold in the data set and equal to zero. This would be the ideal table:

 

Condition, Session, ID, Count

1    1   a   0

1    2   a   0

1    3   a   0

1    4   a   5

 

So the 4th session is where the subject had a width greater than or equal to 5.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2523 views
  • 0 likes
  • 4 in conversation