I have a dataset with variable id and time among many. I created a variable count to count the number of observations for each id in each time. For example, for id 1 and time 1, there are totally 30 observations, the count variable will show 1, 2, until 30, for id 1 time 2, there are totally 36 observations, count will be 1 to 36. Now I want to drop any id that has less than 36 observations. What code should I use? I tried to use where count=36, but apparently that does not work as it will only keep observations with count variable = 36. Thanks.
proc sql;
create table want as
select *
from have
group by id,time
having count(*) ge 36 ;
quit;
Ksharp
Hi,
Try to use :
if count <36 then delete;
Regards
Uma Shanker Saini
This cannot work b/c I want only ids that have full 36 observations. Please see example below.
id time count
1 1 1
1 1 2
...
1 1 30
1 2 1
1 2 2
...
1 2 36
I want to delete all obs with id=1 and time=1 since only 30 observations for id=1 and time=1. I want to keep all obs with id=1 and time=2.
Hi,
rewriting my understanding..
you are having a data, It has 30 observation with ID=1 and time =1 and it also has 36 observation with ID=1 and time =2.
You just want to keep only those observation which have ID=1 and Time =2.
i am just writing only one if statement for this...
If Id=1 and time =2 then output datasetname; * it would create a new data set *
or
If id=1 and time=1 then delete;
else output DSN;
or
Data have;
set want;
if id=1 and time=2;
run;
Kindly correctly me, if i am wrong..
HI, you are right, but I have a huge dataset with different id and time. It's impossible for me to write the way you showed. I cannot predict which id and time combination meets requirement.
proc sql;
create table want as
select *
from have
group by id,time
having count(*) ge 36 ;
quit;
Ksharp
Hi Sean,
Is this what you are trying to achieve?
PROC SQL;
CREATE TABLE WANT AS
SELECT A.*
FROM
(SELECT *
FROM WORK.HAVE
) A /*Creates an Alias for use in the where clause*/,
(
SELECT ID,
TIME1,
MAX(COUNT) AS MAX_COUNT
FROM WORK.HAVE
GROUP BY ID, TIME1
HAVING CALCULATED MAX_COUNT >= 36
) B /*Creates an Alias for use in the where clause*/
WHERE A.ID = B.ID
AND A.TIME1 = B.TIME1
ORDER BY ID, TIME1, COUNT
;
QUIT;
If you aren't familiar with SQL you could also use a SAS merge as follows, which is more efficient.
DATA WANT;
MERGE HAVE (IN=A WHERE=(COUNT >= 36))
HAVE (IN=B);
BY ID TIME1;
IF A AND B;
RUN;
I hope that this is what you were after.
Regards,
Scott
Hi Scott. You are right in this way, but a bit long in coding compared to the one solution above. Thanks anyway.
Hi @Seanz.
If i understand correctly here is the solution.
with the bellow code you will get observations that ID have full 4 observations.others can be deleted.
Data A;
input ID time;
cards;
1 1
1 1
1 1
1 2
1 2
1 2
1 2
1 3
1 3
;
run;
Data Aa;
set a;
by time;
if First.time then count=0;
count+1;
run;
Proc Sql;
Select Max(Time) into :Values separated by ' ' from aa
Group by Time
having max(Count) lt 4;
quit;
Data Aaa(drop=count);
set AA;
if time in (&values) then delete;
run;
Thanks &Regards.
Sanjeev.K
Message was edited by: sajeev kuridi
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.