Solved: Re: How to find variable that is present in each time period

Ltwo · Posted 07-31-2017 02:05 AM

Hi SAS experts, I have a dataset that shows something like this:

data SASHELP.SAMPLE;
  infile datalines dsd truncover;
  input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
  format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
01/01/2017,1111,ABCD
01/01/2017,1111,ABC
01/01/2017,2111,ABCD
08/01/2017,1000,XY
08/01/2017,2111,ABC
08/01/2017,2111,XYZ
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1111,ABC
22/01/2017,1000,ABC
;;;;

What I'm basically looking for, is to identify PERSONs that have recorded an observation in each of the 4 dates that I have.

So as we can see from the dataset above, only PERSON="1000" satisfies my requirement. So I want my eventual dataset to only contain PERSONs that meets the requirement.

data SASHELP.WANT;
 infile datalines dsd truncover;
  input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
  format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
08/01/2017,1000,XY
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1000,ABC
;;;;

Ksharp · Posted 07-31-2017 08:43 AM




data SAMPLE;
  infile datalines dsd truncover;
  input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
  format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
01/01/2017,1111,ABCD
01/01/2017,1111,ABC
01/01/2017,2111,ABCD
08/01/2017,1000,XY
08/01/2017,2111,ABC
08/01/2017,2111,XYZ
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1111,ABC
22/01/2017,1000,ABC
;;;;

proc sql;
create table want as
 select *
  from sample
   group by person
    having count(distinct date)=
(select count(distinct date) from sample)
order by person,date;
quit;

View solution in original post

LinusH · Posted 07-31-2017 02:19 AM

If your real data is not huge, I would first query how many dates that you have, and store that as a macro variable (SQL SELECT INTO:...).
Then the next query with a group by on Person, and having count (date) = &yourmacrovarcreatedinthepreviousstep.

Can a person gave multiple entries for a given date?

Data never sleeps

Ltwo · Posted 07-31-2017 02:23 AM

Unfortunately the real data is huge, with millions of rows. The date is always a Sunday (not sure if this is something I can work off?) and represents the week starting (so e.g. week starting Sunday 01/01/2017). So let's say that I would like to see how many PERSONs have observations every Sunday for a particular period (e.g. 50 or 100 weeks).

A PERSON can have multiple entries a given date.

Kurt_Bremser · Posted 07-31-2017 02:30 AM

You can do it in a single SQL step, but be aware that the performance might be bad if your dataset is large:

proc sql;
create table want as
select a.*
from sample a
where a.person in (
  select b.person
  from sample b
  group by b.person
  having count(distinct date) >= 4
);
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

gamotte · Posted 07-31-2017 10:09 AM

Hello,

I don't think you need a subrequest here.

proc sql noprint;
	CREATE TABLE want AS
	SELECT *
	FROM SAMPLE
	GROUP BY PERSON
	HAVING count(DISTINCT DATE) ge 4;
quit;

Ksharp · Posted 07-31-2017 08:43 AM




data SAMPLE;
  infile datalines dsd truncover;
  input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
  format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
01/01/2017,1111,ABCD
01/01/2017,1111,ABC
01/01/2017,2111,ABCD
08/01/2017,1000,XY
08/01/2017,2111,ABC
08/01/2017,2111,XYZ
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1111,ABC
22/01/2017,1000,ABC
;;;;

proc sql;
create table want as
 select *
  from sample
   group by person
    having count(distinct date)=
(select count(distinct date) from sample)
order by person,date;
quit;

Classroom Training Available!