Solved
Contributor
Posts: 37

# Grouping same values together

Hi,

Is there a procedure or data step to group same values in variables together. This is my data and what I am trying to achieve:

Data:

Var1     Var2

A          1

B          2

C          2

D          1

E          3

F          6

G          6

H          6

I            6

J           5

K          4

Desired Result:

Var1     Var2

B          2

C          2

F          6

G          6

H          6

I            6

I basically want all the same value in the variable along side each other to be extracted and delete the rest. I am not sure whether its possible technically because manually the dataset is way too large. Thanks in advance for the help.

-Akber.

Accepted Solutions
Solution
‎11-11-2012 11:58 AM
Posts: 5,529

## Re: Grouping same values together

The question seems to be : How to eliminate sequences of var2 of length = 1. Here is a simple way :

data have;
input Var1 \$   Var2;
datalines;
A          1
B          2
C          2
D          1
E          3
F          6
G          6
H          6
I          6
J          5
K          4
;

data want;
set have; by var2 notsorted;
if not (first.var2 and last.var2);
run;

PG

PG

All Replies
Super Contributor
Posts: 543

## Re: Grouping same values together

I don't understand how did you summarize the data set that you say you want.

What happened to the values 1,3,4 and 5?

When you say you want all the same values in the variable, you mean var1 or var2 in your data set?

Please give a bit more detail.

Thanks!

Contributor
Posts: 37

## Re: Grouping same values together

Hi,

I didnt summarize this using code, that is what I was TRYING to find out.

I wanted to pick all rows with 'Var2' being exactly the same and together in sequence and delete the rest. Is there anyway to achieve this?

I mean occurrence of Var2 say equal to 2 more than once in continuous sequence. Thanks for trying to help.

-Akber.

Solution
‎11-11-2012 11:58 AM
Posts: 5,529

## Re: Grouping same values together

The question seems to be : How to eliminate sequences of var2 of length = 1. Here is a simple way :

data have;
input Var1 \$   Var2;
datalines;
A          1
B          2
C          2
D          1
E          3
F          6
G          6
H          6
I          6
J          5
K          4
;

data want;
set have; by var2 notsorted;
if not (first.var2 and last.var2);
run;

PG

PG
Contributor
Posts: 37

## Re: Grouping same values together

Hi PG,

Thanks for the answer. It is quite difficult to explain but what I am trying to do is: Keep rows with var2 recurring and delete the ones that are not for example 1, 3, 5, 4 which are not recurring again in the next row. Thanks again.

-Akber.

PROC Star
Posts: 8,164

## Re: Grouping same values together

: PG has already provided the best answer, the one that is easiest to code and takes the least time to run.  However, since I was comparing the processing time with another way to accomplish the same thing, thought I'd post another way of doing it.

I recommend using PG's method, but a better way of explaining what you wanted would be to say that you only want records that have var2 values that are either the same as the previous record or the same as the next record.  PG's code does that nicely.  The following does the same thing, but runs slightly slower (for a 5 million record file, PG's code ran in 1.3 seconds, where the following code took 1.5 seconds):

data want (drop=next);

set have;

set have ( firstobs = 2 keep = var2 rename = (var2 = next) )

have (      obs = 1 drop = _all_                       );

if var2 eq lag(var2) or var2 eq next;

run;

Super Contributor
Posts: 464

## Re: Grouping same values together

Hey  Arthur,

how would you know this info if you dont run the actual data?

(for a 5 million record file, PG's code ran in 1.3 seconds, where the following code took 1.5 seconds)

PROC Star
Posts: 8,164

## Re: Grouping same values together

: I took the original example, concatenated 500,000 copies of it so each 11 records were in the same order as the original example, and ran both both sets of code on the expanded file.  The only reason I made the file bigger was so that I could see the time differences.  On just the example dataset, both programs took 0 seconds to run.

Super Contributor
Posts: 464

## Re: Grouping same values together

sorry but how did you manage to concatenate that many copies?

Any sort of abbreviation you've used here on the concatenation statement?

The little book of SAS does not have many of the tricks

PROC Star
Posts: 8,164

## Re: Grouping same values together

: No trick.  There is probably an easier way, but I just read and wrote out the dataset 500,000 times.  I.e.:

data have;

input Var1 \$   Var2;

datalines;

A          1

B          2

C          2

D          1

E          3

F          6

G          6

H          6

I          6

J          5

K          4

;

data test (drop=i);

do i=1 to 500000;

do j=1 to 11;

set have point=j;

output;

end;

end;

stop;

run;

Super Contributor
Posts: 464

thanks  alot

Contributor
Posts: 37

## Re: Grouping same values together

Thanks Arthur, ur code is classy and the explanation suttle as always.

Super Contributor
Posts: 543

## Re: Grouping same values together

Thank you PG for clarifying.

I modified your code to see what's happening;

data want;

set have; by var2 notsorted;

*if not (first.var2 and last.var2);

if first.var2 = 1 and last.var2 = 1 then to_delete = 1;

run;

so I can see clearly what gets deleted.

Thanks!

Contributor
Posts: 37