Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted a month ago
(670 views)

I would like to count the number of tests collected. For each observation, a sample collected would be counted as 1. For example if visit_id 01 had samples 1, 2, 3 and 4 collected that would be counted as one. If visit_id 06 has zero samples collected that would be 0. I'm guessing I have to create a new variable that does that, I'm just not sure how. Thank you.

```
DATA COUNT;
INPUT VISIT_ID SAMPLE_1 SAMPLE_2 SAMPLE_3 SAMPLE_4;
CARDS;
01 1 1 0 1
02 1 0 1 1
03 0 0 1 0
04 1 1 1 1
```

05 1 0 0 0

06 0 0 0 0
;
RUN;

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@shami wrote:

I would like to count the number of tests collected. For each observation, a sample collected would be counted as 1. For example if visit_id 01 had samples 1, 2, 3 and 4 collected that would be counted as one. If visit_id 06 has zero samples collected that would be 0. I'm guessing I have to create a new variable that does that, I'm just not sure how. Thank you.

It really helps to provide a general rule stating variable names and values. Such as Count the observations where any of the variables Sample1 through Sample4 have at least one value > 0. Then show an example of the desired output based on your given example.

For my description above:

proc means data=count n; where max(SAMPLE_1, SAMPLE_2, SAMPLE_3, SAMPLE_4)> 0; var visit_id; run;

Which shows an output like:

Analysis Variable : VISIT_ID |
---|

N |

5 |

The Where statement means that SAS will use any observation where there is at least one value greater than 0 for the procedure.

Caveat: this would include 0.0001 as well if by chance one of the variables had such.

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Your words and subject line are unclear. You say "new variable that counts across" but later you say "For example if visit_id 01 had samples 1, 2, 3 and 4 collected that would be counted as one." These two are contradictory. I don't really know what you want

Nevertheless, if you have 0s and 1s, then "counts across", if that is what you want, is done like this.

```
data want;
set have;
count=sum(of sample_1-sample_4);
run;
```

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@shami wrote:

So I showed how to compute this for the sum. You wanted 1 or 0. Got it. In my example, instead of sum, here's a thought question for you ... what statistic, instead of sum, gives you a 1 or 0 depending on whether there is a 1 in the row, or all zeros in the row?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@shami wrote:

I would like to count the number of tests collected. For each observation, a sample collected would be counted as 1. For example if visit_id 01 had samples 1, 2, 3 and 4 collected that would be counted as one. If visit_id 06 has zero samples collected that would be 0. I'm guessing I have to create a new variable that does that, I'm just not sure how. Thank you.

It really helps to provide a general rule stating variable names and values. Such as Count the observations where any of the variables Sample1 through Sample4 have at least one value > 0. Then show an example of the desired output based on your given example.

For my description above:

proc means data=count n; where max(SAMPLE_1, SAMPLE_2, SAMPLE_3, SAMPLE_4)> 0; var visit_id; run;

Which shows an output like:

Analysis Variable : VISIT_ID |
---|

N |

5 |

The Where statement means that SAS will use any observation where there is at least one value greater than 0 for the procedure.

Caveat: this would include 0.0001 as well if by chance one of the variables had such.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Your descriptions are confusing.

If you want to count the total number of SAMPLES then you want to sum both ways. Order should not matter.

So assuming that the values of the SAMPLE_n variables represents the number of test done on each sample you could use the SAS function SUM(,) to add the values in each observation and then the SQL aggregate function SUM() to find the grand total across all observations.

```
proc sql;
select sum( sum(sample_1,sample_2,sample_3_sample_4)) as grand_count
from count
;
quit;
```

But since the values you show are only 1 or 0 perhaps the SAMPLE_n variables indicate whether or not any samples were taken? (If so then why was the dataset named COUNT??) And perhaps what you want to find out is how many observations had ANY samples taken? In that case for boolean 1/0 variables the MAX() will indicate if any of them was true or not.

In that case you might want to do something like this using the SAS function MAX(,) and then the SQL function SUM().

```
proc sql
select sum(max(sample_1,sample_2,sample_3,sample_4))
as nvisits label='Number of visits with any samples'
from count;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

```
DATA COUNT;
INPUT VISIT_ID SAMPLE_1 SAMPLE_2 SAMPLE_3 SAMPLE_4;
CARDS;
01 1 1 0 1
02 1 0 1 1
03 0 0 1 0
04 1 1 1 1
```

05 1 0 0 0

06 0 0 0 0
;
RUN;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You might want to change the code based on your logic, but it sounds like you want to check if the sum of the indicator values (sample_i ) for each observation are greater than 0. If it is, then assign a value of 1, otherwise 0.

```
data want;
set count;
tests=0;
if sum(of sample_:)>0 then tests=1;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Noted.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

Upcoming Events

- SAS Bowl XLII, The SAS Hackathon 2024 | 14-Aug-2024
- Ask the Expert: Top Tips for SAS®9 Programmers Moving to SAS® Viya® | 15-Aug-2024
- Ask the Expert: Workload Orchestration in SAS® Viya® 4 | 22-Aug-2024
- Ask the Expert: Solving Multi-Objective Optimization Models in SAS® Optimization | 29-Aug-2024
- WUSS 2024 | 04-Sep-2024
- Ask the Expert: Executing SAS Analytics From R Shiny Applications | 05-Sep-2024
- Ask the Expert: How Do I Perform Customer Segmentation With SAS Intelligent Decisioning? | 12-Sep-2024

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.