I have a dataset with variable id and time among many. I created a variable count to count the number of observations for each id in each time. For example, for id 1 and time 1, there are totally 30 observations, the count variable will show 1, 2, until 30, for id 1 time 2, there are totally 36 observations, count will be 1 to 36. Now I want to drop any id that has less than 36 observations. What code should I use? I tried to use where count=36, but apparently that does not work as it will only keep observations with count variable = 36. Thanks.
proc sql;
create table want as
select *
from have
group by id,time
having count(*) ge 36 ;
quit;
Ksharp
Hi,
Try to use :
if count <36 then delete;
Regards
Uma Shanker Saini
This cannot work b/c I want only ids that have full 36 observations. Please see example below.
id time count
1 1 1
1 1 2
...
1 1 30
1 2 1
1 2 2
...
1 2 36
I want to delete all obs with id=1 and time=1 since only 30 observations for id=1 and time=1. I want to keep all obs with id=1 and time=2.
Hi,
rewriting my understanding..
you are having a data, It has 30 observation with ID=1 and time =1 and it also has 36 observation with ID=1 and time =2.
You just want to keep only those observation which have ID=1 and Time =2.
i am just writing only one if statement for this...
If Id=1 and time =2 then output datasetname; * it would create a new data set *
or
If id=1 and time=1 then delete;
else output DSN;
or
Data have;
set want;
if id=1 and time=2;
run;
Kindly correctly me, if i am wrong..
HI, you are right, but I have a huge dataset with different id and time. It's impossible for me to write the way you showed. I cannot predict which id and time combination meets requirement.
proc sql;
create table want as
select *
from have
group by id,time
having count(*) ge 36 ;
quit;
Ksharp
Hi Sean,
Is this what you are trying to achieve?
PROC SQL;
CREATE TABLE WANT AS
SELECT A.*
FROM
(SELECT *
FROM WORK.HAVE
) A /*Creates an Alias for use in the where clause*/,
(
SELECT ID,
TIME1,
MAX(COUNT) AS MAX_COUNT
FROM WORK.HAVE
GROUP BY ID, TIME1
HAVING CALCULATED MAX_COUNT >= 36
) B /*Creates an Alias for use in the where clause*/
WHERE A.ID = B.ID
AND A.TIME1 = B.TIME1
ORDER BY ID, TIME1, COUNT
;
QUIT;
If you aren't familiar with SQL you could also use a SAS merge as follows, which is more efficient.
DATA WANT;
MERGE HAVE (IN=A WHERE=(COUNT >= 36))
HAVE (IN=B);
BY ID TIME1;
IF A AND B;
RUN;
I hope that this is what you were after.
Regards,
Scott
Hi Scott. You are right in this way, but a bit long in coding compared to the one solution above. Thanks anyway.
Hi @Seanz.
If i understand correctly here is the solution.
with the bellow code you will get observations that ID have full 4 observations.others can be deleted.
Data A;
input ID time;
cards;
1 1
1 1
1 1
1 2
1 2
1 2
1 2
1 3
1 3
;
run;
Data Aa;
set a;
by time;
if First.time then count=0;
count+1;
run;
Proc Sql;
Select Max(Time) into :Values separated by ' ' from aa
Group by Time
having max(Count) lt 4;
quit;
Data Aaa(drop=count);
set AA;
if time in (&values) then delete;
run;
Thanks &Regards.
Sanjeev.K
Message was edited by: sajeev kuridi
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.