Hi,
Is there a procedure or data step to group same values in variables together. This is my data and what I am trying to achieve:
Data:
Var1 Var2
A 1
B 2
C 2
D 1
E 3
F 6
G 6
H 6
I 6
J 5
K 4
Desired Result:
Var1 Var2
B 2
C 2
F 6
G 6
H 6
I 6
I basically want all the same value in the variable along side each other to be extracted and delete the rest. I am not sure whether its possible technically because manually the dataset is way too large. Thanks in advance for the help.
-Akber.
The question seems to be : How to eliminate sequences of var2 of length = 1. Here is a simple way :
data have;
input Var1 $ Var2;
datalines;
A 1
B 2
C 2
D 1
E 3
F 6
G 6
H 6
I 6
J 5
K 4
;
data want;
set have; by var2 notsorted;
if not (first.var2 and last.var2);
run;
PG
I don't understand how did you summarize the data set that you say you want.
What happened to the values 1,3,4 and 5?
When you say you want all the same values in the variable, you mean var1 or var2 in your data set?
Please give a bit more detail.
Thanks!
Hi,
I didnt summarize this using code, that is what I was TRYING to find out.
I wanted to pick all rows with 'Var2' being exactly the same and together in sequence and delete the rest. Is there anyway to achieve this?
I mean occurrence of Var2 say equal to 2 more than once in continuous sequence. Thanks for trying to help.
-Akber.
The question seems to be : How to eliminate sequences of var2 of length = 1. Here is a simple way :
data have;
input Var1 $ Var2;
datalines;
A 1
B 2
C 2
D 1
E 3
F 6
G 6
H 6
I 6
J 5
K 4
;
data want;
set have; by var2 notsorted;
if not (first.var2 and last.var2);
run;
PG
Hi PG,
Thanks for the answer. It is quite difficult to explain but what I am trying to do is: Keep rows with var2 recurring and delete the ones that are not for example 1, 3, 5, 4 which are not recurring again in the next row. Thanks again.
-Akber.
: PG has already provided the best answer, the one that is easiest to code and takes the least time to run. However, since I was comparing the processing time with another way to accomplish the same thing, thought I'd post another way of doing it.
I recommend using PG's method, but a better way of explaining what you wanted would be to say that you only want records that have var2 values that are either the same as the previous record or the same as the next record. PG's code does that nicely. The following does the same thing, but runs slightly slower (for a 5 million record file, PG's code ran in 1.3 seconds, where the following code took 1.5 seconds):
data want (drop=next);
set have;
set have ( firstobs = 2 keep = var2 rename = (var2 = next) )
have ( obs = 1 drop = _all_ );
if var2 eq lag(var2) or var2 eq next;
run;
Hey Arthur,
how would you know this info if you dont run the actual data?
(for a 5 million record file, PG's code ran in 1.3 seconds, where the following code took 1.5 seconds)
: I took the original example, concatenated 500,000 copies of it so each 11 records were in the same order as the original example, and ran both both sets of code on the expanded file. The only reason I made the file bigger was so that I could see the time differences. On just the example dataset, both programs took 0 seconds to run.
sorry but how did you manage to concatenate that many copies?
Any sort of abbreviation you've used here on the concatenation statement?
The little book of SAS does not have many of the tricks
thanks alot
Thanks Arthur, ur code is classy and the explanation suttle as always.
Thank you PG for clarifying.
I modified your code to see what's happening;
data want;
set have; by var2 notsorted;
*if not (first.var2 and last.var2);
if first.var2 = 1 and last.var2 = 1 then to_delete = 1;
run;
so I can see clearly what gets deleted.
Thanks!
Thanks for the help!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.