Hi SAS experts, I have a dataset that shows something like this:
data SASHELP.SAMPLE;
infile datalines dsd truncover;
input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
01/01/2017,1111,ABCD
01/01/2017,1111,ABC
01/01/2017,2111,ABCD
08/01/2017,1000,XY
08/01/2017,2111,ABC
08/01/2017,2111,XYZ
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1111,ABC
22/01/2017,1000,ABC
;;;;What I'm basically looking for, is to identify PERSONs that have recorded an observation in each of the 4 dates that I have.
So as we can see from the dataset above, only PERSON="1000" satisfies my requirement. So I want my eventual dataset to only contain PERSONs that meets the requirement.
data SASHELP.WANT;
infile datalines dsd truncover;
input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
08/01/2017,1000,XY
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1000,ABC
;;;;
data SAMPLE;
infile datalines dsd truncover;
input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
01/01/2017,1111,ABCD
01/01/2017,1111,ABC
01/01/2017,2111,ABCD
08/01/2017,1000,XY
08/01/2017,2111,ABC
08/01/2017,2111,XYZ
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1111,ABC
22/01/2017,1000,ABC
;;;;
proc sql;
create table want as
select *
from sample
group by person
having count(distinct date)=
(select count(distinct date) from sample)
order by person,date;
quit;
Unfortunately the real data is huge, with millions of rows. The date is always a Sunday (not sure if this is something I can work off?) and represents the week starting (so e.g. week starting Sunday 01/01/2017). So let's say that I would like to see how many PERSONs have observations every Sunday for a particular period (e.g. 50 or 100 weeks).
A PERSON can have multiple entries a given date.
You can do it in a single SQL step, but be aware that the performance might be bad if your dataset is large:
proc sql;
create table want as
select a.*
from sample a
where a.person in (
select b.person
from sample b
group by b.person
having count(distinct date) >= 4
);
quit;
Hello,
I don't think you need a subrequest here.
proc sql noprint;
CREATE TABLE want AS
SELECT *
FROM SAMPLE
GROUP BY PERSON
HAVING count(DISTINCT DATE) ge 4;
quit;
data SAMPLE;
infile datalines dsd truncover;
input DATE:DDMMYY10. PERSON:$4. TASK:$4.;
format DATE date9.;
datalines4;
01/01/2017,1000,AB
01/01/2017,1000,ABC
01/01/2017,1111,ABCD
01/01/2017,1111,ABC
01/01/2017,2111,ABCD
08/01/2017,1000,XY
08/01/2017,2111,ABC
08/01/2017,2111,XYZ
15/01/2017,1000,ABC
15/01/2017,1000,ABCD
15/01/2017,1000,AA
22/01/2017,1111,ABC
22/01/2017,1000,ABC
;;;;
proc sql;
create table want as
select *
from sample
group by person
having count(distinct date)=
(select count(distinct date) from sample)
order by person,date;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.