Spent a long time trying to figure out how it can be done. But failed. Let's say my dataset look like this:
Sl. Return Conz
1 4.5563 missing
2 4.8562 missing
3 0.0000 1
4 3.5879 missing
5 0.0000 missing
6 0.0000 missing
7 0.0000 missing
8 0.0000 4
9 2.5879 missing
10 3.6879 missing
11 3.6869 missing
12 4.5859 missing
13 0.0000 missing
14 0.0000 missing
15 0.0000 missing
16 0.0000 missing
17 0.0000 missing
18 0.0000 6
19 4.5879 missing
20 3.5629 missing
This dataset is a result of a merge. Hence, Conz has missing values. Conz variable is a count of zero returns. I want to delete corresponding zero returns for Conz>= 4. So, row 5-8 and row 13-18 should get deleted.
How can I achieve this???
Much thanks!
You have to read and count observations until CONZ>=4 (or until you encounter end of data set have). Then set a keep count as the original count-CONZ (if CONZ>=4) or the original count otherwise. Reread the original count, outputting only up to the keep count.
data have;
input Sl Return Conz;
datalines;
1 4.5563 .
2 4.8562 .
3 0.0000 1
4 3.5879 .
5 0.0000 .
6 0.0000 .
7 0.0000 .
8 0.0000 4
9 2.5879 .
10 3.6879 .
11 3.6869 .
12 4.5859 .
13 0.0000 .
14 0.0000 .
15 0.0000 .
16 0.0000 .
17 0.0000 .
18 0.0000 6
19 4.5879 .
20 3.5629 .
run;
data want (drop=_:);
do _n=1 by 1 until (conz>=4 or end_of_have);
set have end=end_of_have;
end;
if conz>=4 then _keep=_n-conz;
else _keep=_n;
do _j=1 to _n; /*Now reread*/
set have;
if _j<=_keep then output; /*and selectively output*/
end;
run;
Please post the data in a usable form: as data-step with datalines statement, so that we have all variables exactly in the form you have them.
Unfortunately, the dataset I have is huge one having more than 800k observations. There are 19 columns. I do not have the datalines. What I have described above is the problem which I need to solve. I am sorry; I cannot get how I can make it easy for you.
@d6k5d3 wrote:
Unfortunately, the dataset I have is huge one having more than 800k observations. There are 19 columns. I do not have the datalines. What I have described above is the problem which I need to solve. I am sorry; I cannot get how I can make it easy for you.
Obviously, we do not need all the observations, just enough to test code, and mainly for having the structure of your dataset. That's quite easy to do, a tool for this is in my footnotes.
Creating fake data for testing in a data step with datalines is one of THE essential skills you'll need, so learn that NOW.
You have to read and count observations until CONZ>=4 (or until you encounter end of data set have). Then set a keep count as the original count-CONZ (if CONZ>=4) or the original count otherwise. Reread the original count, outputting only up to the keep count.
data have;
input Sl Return Conz;
datalines;
1 4.5563 .
2 4.8562 .
3 0.0000 1
4 3.5879 .
5 0.0000 .
6 0.0000 .
7 0.0000 .
8 0.0000 4
9 2.5879 .
10 3.6879 .
11 3.6869 .
12 4.5859 .
13 0.0000 .
14 0.0000 .
15 0.0000 .
16 0.0000 .
17 0.0000 .
18 0.0000 6
19 4.5879 .
20 3.5629 .
run;
data want (drop=_:);
do _n=1 by 1 until (conz>=4 or end_of_have);
set have end=end_of_have;
end;
if conz>=4 then _keep=_n-conz;
else _keep=_n;
do _j=1 to _n; /*Now reread*/
set have;
if _j<=_keep then output; /*and selectively output*/
end;
run;
According to your description, that's a simple where condition:
where return ne 0 or conz ne .;
You can use the same condition as a subsetting if in your merge data step.
@d6k5d3 wrote:
If I write "return ne 0", I would lose all zero return observations, which I do not want.
PLEASE.READ.MY.WHOLE.POST. With diligence.
My condition will remove only observations where return is zero AND conz is missing.
Just test it.
I think the OP is worried about a sequence of 2 records with return=0 followed by a record with CONZ=2. Although that situation is not in the sample date, I believe it is supposed to be kept.
@mkeintz wrote:
I think the OP is worried about a sequence of 2 records with return=0 followed by a record with CONZ=2. Although that situation is not in the sample date, I believe it is supposed to be kept.
Well, that's what happens when someone posts incomplete example data in unusable form. Maxim 42.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.