BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
akberali67
Calcite | Level 5

Hi,

Is there a procedure or data step to group same values in variables together. This is my data and what I am trying to achieve:

Data:

Var1     Var2

A          1

B          2

C          2

D          1

E          3

F          6

G          6

H          6

I            6

J           5

K          4

Desired Result:

Var1     Var2

B          2

C          2

F          6

G          6

H          6

I            6

I basically want all the same value in the variable along side each other to be extracted and delete the rest. I am not sure whether its possible technically because manually the dataset is way too large. Thanks in advance for the help.

-Akber.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

The question seems to be : How to eliminate sequences of var2 of length = 1. Here is a simple way :

data have;
input Var1 $   Var2;
datalines;
A          1
B          2
C          2
D          1
E          3
F          6
G          6
H          6
I          6
J          5
K          4
;

data want;
set have; by var2 notsorted;
if not (first.var2 and last.var2);
run;

PG

PG

View solution in original post

13 REPLIES 13
AncaTilea
Pyrite | Level 9

I don't understand how did you summarize the data set that you say you want.

What happened to the values 1,3,4 and 5?

When you say you want all the same values in the variable, you mean var1 or var2 in your data set?

Please give a bit more detail.

Thanks!

akberali67
Calcite | Level 5

Hi,

I didnt summarize this using code, that is what I was TRYING to find out.

I wanted to pick all rows with 'Var2' being exactly the same and together in sequence and delete the rest. Is there anyway to achieve this?

I mean occurrence of Var2 say equal to 2 more than once in continuous sequence. Thanks for trying to help.

-Akber.

PGStats
Opal | Level 21

The question seems to be : How to eliminate sequences of var2 of length = 1. Here is a simple way :

data have;
input Var1 $   Var2;
datalines;
A          1
B          2
C          2
D          1
E          3
F          6
G          6
H          6
I          6
J          5
K          4
;

data want;
set have; by var2 notsorted;
if not (first.var2 and last.var2);
run;

PG

PG
akberali67
Calcite | Level 5

Hi PG,

Thanks for the answer. It is quite difficult to explain but what I am trying to do is: Keep rows with var2 recurring and delete the ones that are not for example 1, 3, 5, 4 which are not recurring again in the next row. Thanks again.

-Akber.

art297
Opal | Level 21

: PG has already provided the best answer, the one that is easiest to code and takes the least time to run.  However, since I was comparing the processing time with another way to accomplish the same thing, thought I'd post another way of doing it.

I recommend using PG's method, but a better way of explaining what you wanted would be to say that you only want records that have var2 values that are either the same as the previous record or the same as the next record.  PG's code does that nicely.  The following does the same thing, but runs slightly slower (for a 5 million record file, PG's code ran in 1.3 seconds, where the following code took 1.5 seconds):

data want (drop=next);

  set have;

  set have ( firstobs = 2 keep = var2 rename = (var2 = next) )

      have (      obs = 1 drop = _all_                       );

  if var2 eq lag(var2) or var2 eq next;

run;

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

Hey  Arthur,

how would you know this info if you dont run the actual data?


(for a 5 million record file, PG's code ran in 1.3 seconds, where the following code took 1.5 seconds)

art297
Opal | Level 21

: I took the original example, concatenated 500,000 copies of it so each 11 records were in the same order as the original example, and ran both both sets of code on the expanded file.  The only reason I made the file bigger was so that I could see the time differences.  On just the example dataset, both programs took 0 seconds to run.

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

sorry but how did you manage to concatenate that many copies?

Any sort of abbreviation you've used here on the concatenation statement?

The little book of SAS does not have many of the tricks Smiley Happy

art297
Opal | Level 21

: No trick.  There is probably an easier way, but I just read and wrote out the dataset 500,000 times.  I.e.:

data have;

input Var1 $   Var2;

datalines;

A          1

B          2

C          2

D          1

E          3

F          6

G          6

H          6

I          6

J          5

K          4

;

data test (drop=i);

  do i=1 to 500000;

    do j=1 to 11;

      set have point=j;

      output;

    end;

  end;

  stop;

run;

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

thanks  alot

akberali67
Calcite | Level 5

Thanks Arthur, ur code is classy and the explanation suttle as always. Smiley Happy

AncaTilea
Pyrite | Level 9

Thank you PG for clarifying.

I modified your code to see what's happening;

data want;

set have; by var2 notsorted;

*if not (first.var2 and last.var2);

if first.var2 = 1 and last.var2 = 1 then to_delete = 1;

run;

so I can see clearly what gets deleted.

Thanks!

akberali67
Calcite | Level 5

Thanks for the help!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1242 views
  • 8 likes
  • 5 in conversation