Hi SAS experts, I have a dataset that shows something like this:
data SASHELP.SAMPLE;
infile datalines dsd truncover;
input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
01/01/2017,1111,ABCD
01/01/2017,1111,ABC
01/01/2017,2111,ABCD
08/01/2017,1000,XY
08/01/2017,2111,ABC
08/01/2017,2111,XYZ
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1111,ABC
22/01/2017,1000,ABC
;;;;
What I'm basically looking for, is to identify PERSONs that have recorded an observation in each of the 4 dates that I have.
So as we can see from the dataset above, only PERSON="1000" satisfies my requirement. So I want my eventual dataset to only contain PERSONs that meets the requirement.
data SASHELP.WANT;
infile datalines dsd truncover;
input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
08/01/2017,1000,XY
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1000,ABC
;;;;
data SAMPLE; infile datalines dsd truncover; input DATE:DDMMYY10. PERSON:$4. TASK:$4.; format DATE date9.; datalines4; 01/01/2017,1000,AB 01/01/2017,1000,ABC 01/01/2017,1111,ABCD 01/01/2017,1111,ABC 01/01/2017,2111,ABCD 08/01/2017,1000,XY 08/01/2017,2111,ABC 08/01/2017,2111,XYZ 15/01/2017,1000,ABC 15/01/2017,1000,ABCD 15/01/2017,1000,AA 22/01/2017,1111,ABC 22/01/2017,1000,ABC ;;;; proc sql; create table want as select * from sample group by person having count(distinct date)= (select count(distinct date) from sample) order by person,date; quit;
Unfortunately the real data is huge, with millions of rows. The date is always a Sunday (not sure if this is something I can work off?) and represents the week starting (so e.g. week starting Sunday 01/01/2017). So let's say that I would like to see how many PERSONs have observations every Sunday for a particular period (e.g. 50 or 100 weeks).
A PERSON can have multiple entries a given date.
You can do it in a single SQL step, but be aware that the performance might be bad if your dataset is large:
proc sql;
create table want as
select a.*
from sample a
where a.person in (
select b.person
from sample b
group by b.person
having count(distinct date) >= 4
);
quit;
Hello,
I don't think you need a subrequest here.
proc sql noprint;
CREATE TABLE want AS
SELECT *
FROM SAMPLE
GROUP BY PERSON
HAVING count(DISTINCT DATE) ge 4;
quit;
data SAMPLE; infile datalines dsd truncover; input DATE:DDMMYY10. PERSON:$4. TASK:$4.; format DATE date9.; datalines4; 01/01/2017,1000,AB 01/01/2017,1000,ABC 01/01/2017,1111,ABCD 01/01/2017,1111,ABC 01/01/2017,2111,ABCD 08/01/2017,1000,XY 08/01/2017,2111,ABC 08/01/2017,2111,XYZ 15/01/2017,1000,ABC 15/01/2017,1000,ABCD 15/01/2017,1000,AA 22/01/2017,1111,ABC 22/01/2017,1000,ABC ;;;; proc sql; create table want as select * from sample group by person having count(distinct date)= (select count(distinct date) from sample) order by person,date; quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.