DATA Step, Macro, Functions and more

How to find variable that is present in each time period

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

How to find variable that is present in each time period

[ Edited ]

Hi SAS experts, I have a dataset that shows something like this:

 

 

data SASHELP.SAMPLE;
  infile datalines dsd truncover;
  input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
  format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
01/01/2017,1111,ABCD
01/01/2017,1111,ABC
01/01/2017,2111,ABCD
08/01/2017,1000,XY
08/01/2017,2111,ABC
08/01/2017,2111,XYZ
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1111,ABC
22/01/2017,1000,ABC
;;;;

What I'm basically looking for, is to identify PERSONs that have recorded an observation in each of the 4 dates that I have. 

 

So as we can see from the dataset above, only PERSON="1000" satisfies my requirement. So I want my eventual dataset to only contain PERSONs that meets the requirement.

 

data SASHELP.WANT;
 infile datalines dsd truncover;
  input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
  format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
08/01/2017,1000,XY
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1000,ABC
;;;;

 


Accepted Solutions
Solution
‎08-06-2017 08:44 PM
Super User
Posts: 10,046

Re: How to find variable that is present in each time period




data SAMPLE;
  infile datalines dsd truncover;
  input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
  format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
01/01/2017,1111,ABCD
01/01/2017,1111,ABC
01/01/2017,2111,ABCD
08/01/2017,1000,XY
08/01/2017,2111,ABC
08/01/2017,2111,XYZ
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1111,ABC
22/01/2017,1000,ABC
;;;;

proc sql;
create table want as
 select *
  from sample
   group by person
    having count(distinct date)=
(select count(distinct date) from sample)
order by person,date;
quit;

View solution in original post


All Replies
Super User
Posts: 5,438

Re: How to find variable that is present in each time period

If your real data is not huge, I would first query how many dates that you have, and store that as a macro variable (SQL SELECT INTO:...).
Then the next query with a group by on Person, and having count (date) = &yourmacrovarcreatedinthepreviousstep.

Can a person gave multiple entries for a given date?
Data never sleeps
Occasional Contributor
Posts: 8

Re: How to find variable that is present in each time period

[ Edited ]

Unfortunately the real data is huge, with millions of rows. The date is always a Sunday (not sure if this is something I can work off?) and represents the week starting (so e.g. week starting Sunday 01/01/2017). So let's say that I would like to see how many PERSONs have observations every Sunday for a particular period (e.g. 50 or 100 weeks).

 

A PERSON can have multiple entries a given date.

Super User
Posts: 7,863

Re: How to find variable that is present in each time period

You can do it in a single SQL step, but be aware that the performance might be bad if your dataset is large:

proc sql;
create table want as
select a.*
from sample a
where a.person in (
  select b.person
  from sample b
  group by b.person
  having count(distinct date) >= 4
);
quit;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Regular Contributor
Posts: 240

Re: How to find variable that is present in each time period

Posted in reply to KurtBremser

Hello,

 

I don't think you need a subrequest here.

 

proc sql noprint;
	CREATE TABLE want AS
	SELECT *
	FROM SAMPLE
	GROUP BY PERSON
	HAVING count(DISTINCT DATE) ge 4;
quit;
Solution
‎08-06-2017 08:44 PM
Super User
Posts: 10,046

Re: How to find variable that is present in each time period




data SAMPLE;
  infile datalines dsd truncover;
  input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
  format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
01/01/2017,1111,ABCD
01/01/2017,1111,ABC
01/01/2017,2111,ABCD
08/01/2017,1000,XY
08/01/2017,2111,ABC
08/01/2017,2111,XYZ
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1111,ABC
22/01/2017,1000,ABC
;;;;

proc sql;
create table want as
 select *
  from sample
   group by person
    having count(distinct date)=
(select count(distinct date) from sample)
order by person,date;
quit;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 152 views
  • 1 like
  • 5 in conversation