Solved
Contributor
Posts: 23

# Selecting group of data based on conditions

I have the following groupings of data (fictitious data) and need a little help.

Value1   Value2   Value3   Value4   Value5   Value6

01          123         00          2            1234       1

01          123         01          3            1235       1

01          123         02          3            1234       2

02          124         00          2            1234       1

02          124         01          3            1236       1

02          124         02          3            1235       1

03          125         00          2            1230       1

03          125         01          3            1234       1

03          125         03          5            1235       1

The Value2 variable will always be the same within each group (account), such as shown above.  Where the Value4 = 2 and Value5 = 1234, then I have to include ALL of the observations in each group (i.e., all three obs in the first two groups above) into an output SAS dataset.  The last grouping of data would NOT be included.

Any help/examples are appreciated (have not yet done something like this in SAS; I'm a novice user).

Accepted Solutions
Solution
‎10-28-2014 11:58 AM
Contributor
Posts: 23

## Re: Selecting group of data based on conditions

Forgot the second parenthesis in the second "sum" line (typo).  This produced what I needed:

proc sql;

create table want as

select * from have

group by value1,value2 having

(sum(catx('  ',value4,value5,value6) = '3 1234 1') or

sum(catx('  ',value4,value5,value6) = '5 1234 1')) and

(sum(value4 = '2' and value5 ne '1234'));

quit;

All Replies
SAS Employee
Posts: 232

Posts: 1,270

## Re: Selecting group of data based on conditions

proc sql;

create table want as

select * from have

where value1 in (select value1 from have where value4=2 and value5=1234);

quit;

Contributor
Posts: 23

## Re: Selecting group of data based on conditions

Thanks so much for the help!

Contributor
Posts: 23

## Re: Selecting group of data based on conditions

I expanded the data example somewhat to be more realistic.  See below.  Given the same value4 and value5 extract criteria, the first group of obs for value1 = 01 would be pulled, the first group of obs where value1=02 would be pulled, and the second group of obs where value=03 would be pulled.  Would this change your code solution any?

Value1   Value2   Value3   Value4   Value5   Value6

01          123         00          2            1234       1

01          123         01          3            1235       1

01          123         02          3            1234       2

01          127         00          2            1237       1

01          127         01          3            1234       1

01          127         02          3            1234       1

02          124         00          2            1234       1

02          124         01          3            1236       1

02          124         02          3            1235       1

02          128         00          2            1238       1

02          128         01          3            1234       1

02          128         02          3            1234       1

03          125         00          2            1230       1

03          125         01          3            1234       1

03          125         03          5            1235       1

03          129         00          2            1234       1

03          129         01          3            1234       1

03          129         02          3            1234       1

PROC Star
Posts: 8,165

## Re: Selecting group of data based on conditions

If I correctly understand what you want to do, one minor correction needed to be made for stat's proposed code to work, namely:

proc sql;

create table want as

select * from have

where value2 in (select value2 from have where value4=2 and value5=1234);

quit;

Super User
Posts: 10,787

## Re: Selecting group of data based on conditions

```data have;
input (Value1   Value2   Value3   Value4   Value5   Value6 ) (\$);
cards;
01          123         00          2            1234       1
01          123         01          3            1235       1
01          123         02          3            1234       2
01          127         00          2            1237       1
01          127         01          3            1234       1
01          127         02          3            1234       1
02          124         00          2            1234       1
02          124         01          3            1236       1
02          124         02          3            1235       1
02          128         00          2            1238       1
02          128         01          3            1234       1
02          128         02          3            1234       1
03          125         00          2            1230       1
03          125         01          3            1234       1
03          125         03          5            1235       1
03          129         00          2            1234       1
03          129         01          3            1234       1
03          129         02          3            1234       1
;
run;
proc sql;
create table want as
select * from have group by value1,value2 having sum(catx(' ',value4,value5)='2 1234') gt 0;
quit;

```

Xia Keshan

Contributor
Posts: 23

## Re: Selecting group of data based on conditions

I tried Xia's code.  I do get a SAS dataset created, but with no observations and six columns.  Also, the SAS Log shows a message that says "The query requires remerging summary statistics back with the original data."

Contributor
Posts: 23

## Re: Selecting group of data based on conditions

One additional note.  I bring in the data from an external text file via a data step using infile and input statements.  However, the suggested Proc SQL code used was from Xia.

William

PROC Star
Posts: 8,165

## Re: Selecting group of data based on conditions

Are Value4 and Value5 both character variables? Also, are both Value1 and Value2 needed to identity a unique group, or just Value2?

The remerging summary statistics note is simply a note produced when incorporating summary functions. Nothing to be concerned about. But not getting the desired data is definitely a concern.

Contributor
Posts: 23

## Re: Selecting group of data based on conditions

The Value4 and Value5 are both character variables.  Also, yes, both Value1 and Value2 would be needed to identify unique groups.  The Value1 is basically a fictitious code (for, let's say, a State), and Value2 is simply a unique fictitious number identifier of each group within each Value1.  In other words, there will be many unique Value2's within each Value1.

PROC Star
Posts: 8,165

## Re: Selecting group of data based on conditions

Actually, 's code will work with both character and numeric. I'll have to guess that your actual data doesn't have any records where Value4=2 and Value5=1234

Contributor
Posts: 23

## Re: Selecting group of data based on conditions

My actual data does have Value4 of 2s, which are also inputted as character format.  The Value5 values (also inputted as character) are actually six-digit numbers (just did them as four-digits here for illustrative purposes).  Also, there is no summing done in the mathematical sense. Just trying to extract certain observations (records) from a database based on the criteria noted, and then output the records to a separate dataset.

William

PROC Star
Posts: 8,165

## Re: Selecting group of data based on conditions

's code is simply counting the number of times that value4 and value5 contain a 2 and 1234, respectively, within the groups records. If the combination exists at least once, all of the groups records are output. You now said that value5 is actually a 6 character variable, thus obviously 1234 won't match. I think you have to provide a better example of what your data look like and what your criteria are. His code can handle any length variable.

Contributor
Posts: 23

## Re: Selecting group of data based on conditions

My apologies, I inadvertently used an incorrect Value5 in my extract criteria.  The Proc SQL code worked and got what I needed.  Thanks so much for your help!

William

🔒 This topic is solved and locked.