DATA Step, Macro, Functions and more

How to drop observations with counts less than a number

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 122
Accepted Solution

How to drop observations with counts less than a number

I have a dataset with variable id and time among many. I created a variable count to count the number of observations for each id in each time. For example, for id 1 and time 1, there are totally 30 observations, the count variable will show 1, 2, until 30, for id 1 time 2, there are totally 36 observations, count will be 1 to 36. Now I want to drop any id that has less than 36 observations. What code should I use? I tried to use where count=36, but apparently that does not work as it will only keep observations with count variable = 36. Thanks.


Accepted Solutions
Solution
‎07-10-2013 03:00 AM
Super User
Posts: 10,028

Re: How to drop observations with counts less than a number

proc sql;

create table want as

  select *

   from have

    group by id,time

     having count(*) ge 36 ;

quit;

Ksharp

View solution in original post


All Replies
Frequent Contributor
Posts: 118

Re: How to drop observations with counts less than a number


Hi,

Try to use :

if count <36 then delete;

Regards

Uma Shanker Saini

Frequent Contributor
Posts: 122

Re: How to drop observations with counts less than a number

Posted in reply to umashankersaini

This cannot work b/c I want only ids that have full 36 observations. Please see example below.

id     time     count

1     1          1

1     1          2

...

1     1          30

1     2          1

1     2          2

...

1     2          36

I want to delete all obs with id=1 and time=1 since only 30 observations for id=1 and time=1. I want to keep all obs with id=1 and time=2.

Frequent Contributor
Posts: 118

Re: How to drop observations with counts less than a number

Hi,

rewriting my understanding..

you are having a data, It has 30 observation with ID=1 and time =1 and it also has 36 observation with ID=1 and time =2.

You just want to keep only those observation which have ID=1 and Time =2.

i am just writing only one if statement for this...

If Id=1 and time =2 then output datasetname; * it would create a new data set *

or

If id=1 and time=1 then delete;

else output DSN;

or

Data have;

set want;

if id=1 and time=2;

run;

Kindly correctly me, if i am wrong..

Frequent Contributor
Posts: 122

Re: How to drop observations with counts less than a number

Posted in reply to umashankersaini

HI, you are right, but I have a huge dataset with different id and time. It's impossible for me to write the way you showed. I cannot predict which id and time combination meets requirement.

Solution
‎07-10-2013 03:00 AM
Super User
Posts: 10,028

Re: How to drop observations with counts less than a number

proc sql;

create table want as

  select *

   from have

    group by id,time

     having count(*) ge 36 ;

quit;

Ksharp

Super Contributor
Posts: 297

Re: How to drop observations with counts less than a number

Hi Sean,

Is this what you are trying to achieve?

PROC SQL;

CREATE TABLE WANT AS

SELECT A.*

FROM

  (SELECT *

  FROM WORK.HAVE

     ) A /*Creates an Alias for use in the where clause*/,

  (

     SELECT ID,

         TIME1,

         MAX(COUNT) AS MAX_COUNT

     FROM WORK.HAVE

  GROUP BY ID, TIME1

  HAVING CALCULATED MAX_COUNT >= 36

     ) B /*Creates an Alias for use in the where clause*/

  WHERE A.ID = B.ID

  AND A.TIME1 = B.TIME1

  ORDER BY ID, TIME1, COUNT

  ;

QUIT;

If you aren't familiar with SQL you could also use a SAS merge as follows, which is more efficient.

DATA WANT;

  MERGE HAVE (IN=A WHERE=(COUNT >= 36))

       HAVE (IN=B);

  BY ID TIME1;

  IF A AND B;

RUN;

I hope that this is what you were after.

Regards,

Scott

Frequent Contributor
Posts: 122

Re: How to drop observations with counts less than a number

Posted in reply to Scott_Mitchell

Hi Scott. You are right in this way, but a bit long in coding compared to the one solution above. Thanks anyway.

Super Contributor
Posts: 276

Re: How to drop observations with counts less than a number

Hi @Seanz.

If i understand correctly here is the solution.

with the bellow code you will get observations that ID have full 4 observations.others can be deleted.

Data A;

input ID time;

cards;

1 1

1 1

1 1

1 2

1 2

1 2

1 2

1 3

1 3

;

run;

Data Aa;

set a;

by time;

if First.time then count=0;

count+1;

run;

Proc Sql;

Select Max(Time) into :Values separated by ' '  from aa

Group by Time

having max(Count) lt 4;

quit;

Data Aaa(drop=count);

set AA;

if time in (&values) then delete;

run;

Thanks &Regards.

Sanjeev.K

Message was edited by: sajeev kuridi

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 4114 views
  • 9 likes
  • 5 in conversation