I have a data set WORKERS. The variables are the worker's ID, maximum number of days on job (MaxDays) and days that the worker showed up to work (DayAtWork). Here's some trivial SAS code:
data workers; input id maxDays DayAtWork;
cards;
125 8 1
125 8 2
125 8 5
125 8 6
125 8 8
150 3 1
150 3 2
150 3 3
162 4 2
;
run;
Here's the data set I'd like to get:
ID Y N
125 5 3
150 3 0
162 1 3
The tricky part comes in because in my real data set each worker could have 100s of days with various combinations of days in which they worked and days that they were absent.
Any help much appreciated!
Andrew
The simplest is probably an SQL solution:
proc sql;
create table want as
select id,y,max-y as n
from(select id,count(distinct DayAtWork) as y,max(maxDays) as max from workers group by id);
quit;
I used COUNT(DISTINCT Dayatwork) in case the same day is repeated. The MAX(maxDays) is just to get a single value.
You can do it in a datastep (data must be sorted) like this:
data want;
do until(last.id);
set workers;
by id DayAtWork;
y=sum(y,first.DayAtWork);
end;
n=maxDays-y;
keep id y n;
run;
Y seems to be the frequency for the given ID, but how should N be calculated?
I think I see what you want:
data want;
set have (rename=(maxdays=y));
by id;
retain n;
if first.id
then n = 1;
else n + 1;
if last.id;
n = y - n;
keep id y n;
run;
Untested, posted from my tablet.
I don't think so. Take the following two examples:
One guy works days 1,2, and 5 of 5 days. The other works days 1, 2, 3, 4, 5 of 5 days.
Please provide example data where my code does not give the expected result, and the expected result for it.
Ah, we also need to re-calculate y:
data want;
set have (rename=(maxdays=y));
by id;
retain n;
if first.id
then n = 1;
else n + 1;
if last.id;
n = y - n;
y = y - n;
keep id y n;
run;
The simplest is probably an SQL solution:
proc sql;
create table want as
select id,y,max-y as n
from(select id,count(distinct DayAtWork) as y,max(maxDays) as max from workers group by id);
quit;
I used COUNT(DISTINCT Dayatwork) in case the same day is repeated. The MAX(maxDays) is just to get a single value.
You can do it in a datastep (data must be sorted) like this:
data want;
do until(last.id);
set workers;
by id DayAtWork;
y=sum(y,first.DayAtWork);
end;
n=maxDays-y;
keep id y n;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.