BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
d6k5d3
Pyrite | Level 9

Spent a long time trying to figure out how it can be done. But failed. Let's say my dataset look like this:

 

Sl.  Return  Conz

 

1    4.5563  missing

2    4.8562  missing

3    0.0000  1

4    3.5879  missing

5    0.0000  missing

6    0.0000  missing

7    0.0000  missing

8    0.0000  4

9    2.5879  missing

10  3.6879  missing

11  3.6869  missing

12  4.5859  missing

13  0.0000  missing

14  0.0000  missing

15  0.0000  missing

16  0.0000  missing

17  0.0000  missing

18  0.0000  6

19  4.5879  missing

20  3.5629  missing

 

This dataset is a result of a merge. Hence, Conz has missing values. Conz variable is a count of zero returns. I want to delete corresponding zero returns for Conz>= 4. So, row 5-8 and row 13-18 should get deleted.

 

How can I achieve this???

 

Much thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You have to read and count observations until CONZ>=4 (or until you encounter end of data set have).  Then set a keep count as the original count-CONZ (if CONZ>=4)  or the original count otherwise.  Reread the original count, outputting only up to the keep count. 

 

data have;
  input Sl  Return  Conz;
datalines;
 1  4.5563  .
 2  4.8562  .
 3  0.0000  1
 4  3.5879  .
 5  0.0000  .
 6  0.0000  .
 7  0.0000  .
 8  0.0000  4
 9  2.5879  .
10  3.6879  .
11  3.6869  .
12  4.5859  .
13  0.0000  .
14  0.0000  .
15  0.0000  .
16  0.0000  .
17  0.0000  .
18  0.0000  6
19  4.5879  .
20  3.5629  .
run;

data want (drop=_:);
  do _n=1 by 1 until (conz>=4 or end_of_have);
    set have end=end_of_have;
  end;
  if conz>=4 then _keep=_n-conz;
  else _keep=_n;
  do _j=1 to _n;  /*Now reread*/
    set have;      
    if _j<=_keep then output; /*and selectively output*/
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
andreas_lds
Jade | Level 19

Please post the data in a usable form: as data-step with datalines statement, so that we have all variables exactly in the form you have them.

d6k5d3
Pyrite | Level 9

Unfortunately, the dataset I have is huge one having more than 800k observations. There are 19 columns. I do not have the datalines. What I have described above is the problem which I need to solve. I am sorry; I cannot get how I can make it easy for you.

Kurt_Bremser
Super User

@d6k5d3 wrote:

Unfortunately, the dataset I have is huge one having more than 800k observations. There are 19 columns. I do not have the datalines. What I have described above is the problem which I need to solve. I am sorry; I cannot get how I can make it easy for you.


Obviously, we do not need all the observations, just enough to test code, and mainly for having the structure of your dataset. That's quite easy to do, a tool for this is in my footnotes.

Creating fake data for testing in a data step with datalines is one of THE essential skills you'll need, so learn that NOW.

mkeintz
PROC Star

You have to read and count observations until CONZ>=4 (or until you encounter end of data set have).  Then set a keep count as the original count-CONZ (if CONZ>=4)  or the original count otherwise.  Reread the original count, outputting only up to the keep count. 

 

data have;
  input Sl  Return  Conz;
datalines;
 1  4.5563  .
 2  4.8562  .
 3  0.0000  1
 4  3.5879  .
 5  0.0000  .
 6  0.0000  .
 7  0.0000  .
 8  0.0000  4
 9  2.5879  .
10  3.6879  .
11  3.6869  .
12  4.5859  .
13  0.0000  .
14  0.0000  .
15  0.0000  .
16  0.0000  .
17  0.0000  .
18  0.0000  6
19  4.5879  .
20  3.5629  .
run;

data want (drop=_:);
  do _n=1 by 1 until (conz>=4 or end_of_have);
    set have end=end_of_have;
  end;
  if conz>=4 then _keep=_n-conz;
  else _keep=_n;
  do _j=1 to _n;  /*Now reread*/
    set have;      
    if _j<=_keep then output; /*and selectively output*/
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
d6k5d3
Pyrite | Level 9
Thank you very much! It works great! However, since I am a beginner, I am having some difficulty in getting the code. I wish I could understand:
what does it mean when you write _:, _n, _keep, _j (what's the significance of keeping an underscore. How is it different from _TYPE_ or _FREQ_? I don't know what I should search online to get an understanding of this.)

What is happening when you write "do _n=1 by 1". You did not mention 'by' in "do _j=1 to _n".

Much thanks once again!
Kurt_Bremser
Super User

According to your description, that's a simple where condition:

where return ne 0 or conz ne .;

You can use the same condition as a subsetting if in your merge data step.

d6k5d3
Pyrite | Level 9
If I write "return ne 0", I would lose all zero return observations, which I do not want.
Kurt_Bremser
Super User

@d6k5d3 wrote:
If I write "return ne 0", I would lose all zero return observations, which I do not want.

PLEASE.READ.MY.WHOLE.POST. With diligence.

My condition will remove only observations where return is zero AND conz is missing.

Just test it.

mkeintz
PROC Star

@Kurt_Bremser

 

I think the OP is worried about a sequence of 2 records with return=0 followed by a record with CONZ=2.  Although that situation is not in the sample date, I believe it is supposed to be kept.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Kurt_Bremser
Super User

@mkeintz wrote:

@Kurt_Bremser

 

I think the OP is worried about a sequence of 2 records with return=0 followed by a record with CONZ=2.  Although that situation is not in the sample date, I believe it is supposed to be kept.


Well, that's what happens when someone posts incomplete example data in unusable form. Maxim 42.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1210 views
  • 4 likes
  • 4 in conversation