Hi SAS experts, I have a dataset that shows something like this:
data SASHELP.SAMPLE;
infile datalines dsd truncover;
input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
01/01/2017,1111,ABCD
01/01/2017,1111,ABC
01/01/2017,2111,ABCD
08/01/2017,1000,XY
08/01/2017,2111,ABC
08/01/2017,2111,XYZ
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1111,ABC
22/01/2017,1000,ABC
;;;;
What I'm basically looking for, is to identify PERSONs that have recorded an observation in each of the 4 dates that I have.
So as we can see from the dataset above, only PERSON="1000" satisfies my requirement. So I want my eventual dataset to only contain PERSONs that meets the requirement.
data SASHELP.WANT;
infile datalines dsd truncover;
input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
08/01/2017,1000,XY
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1000,ABC
;;;;
data SAMPLE; infile datalines dsd truncover; input DATE:DDMMYY10. PERSON:$4. TASK:$4.; format DATE date9.; datalines4; 01/01/2017,1000,AB 01/01/2017,1000,ABC 01/01/2017,1111,ABCD 01/01/2017,1111,ABC 01/01/2017,2111,ABCD 08/01/2017,1000,XY 08/01/2017,2111,ABC 08/01/2017,2111,XYZ 15/01/2017,1000,ABC 15/01/2017,1000,ABCD 15/01/2017,1000,AA 22/01/2017,1111,ABC 22/01/2017,1000,ABC ;;;; proc sql; create table want as select * from sample group by person having count(distinct date)= (select count(distinct date) from sample) order by person,date; quit;
Unfortunately the real data is huge, with millions of rows. The date is always a Sunday (not sure if this is something I can work off?) and represents the week starting (so e.g. week starting Sunday 01/01/2017). So let's say that I would like to see how many PERSONs have observations every Sunday for a particular period (e.g. 50 or 100 weeks).
A PERSON can have multiple entries a given date.
You can do it in a single SQL step, but be aware that the performance might be bad if your dataset is large:
proc sql;
create table want as
select a.*
from sample a
where a.person in (
select b.person
from sample b
group by b.person
having count(distinct date) >= 4
);
quit;
Hello,
I don't think you need a subrequest here.
proc sql noprint;
CREATE TABLE want AS
SELECT *
FROM SAMPLE
GROUP BY PERSON
HAVING count(DISTINCT DATE) ge 4;
quit;
data SAMPLE; infile datalines dsd truncover; input DATE:DDMMYY10. PERSON:$4. TASK:$4.; format DATE date9.; datalines4; 01/01/2017,1000,AB 01/01/2017,1000,ABC 01/01/2017,1111,ABCD 01/01/2017,1111,ABC 01/01/2017,2111,ABCD 08/01/2017,1000,XY 08/01/2017,2111,ABC 08/01/2017,2111,XYZ 15/01/2017,1000,ABC 15/01/2017,1000,ABCD 15/01/2017,1000,AA 22/01/2017,1111,ABC 22/01/2017,1000,ABC ;;;; proc sql; create table want as select * from sample group by person having count(distinct date)= (select count(distinct date) from sample) order by person,date; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.