Hi I have this example dataframe;
PLACE | DATE | NUMBER |
Toronto | 12JAN20201 | 0 |
Toronto | 13JAN20201 | 0 |
Toronto | 14JAN20201 | 99 |
Toronto | 15JAN20201 | 99 |
i want a new column in my data called DAYS - is should basically be based on if NUMBER is more than 10 then is shoul count from when it was less until when it became more than 10 from the earliest date - so in this case at TORONTO, 14JAN2021, 99 it would say in the new column 2 days
hope it makes senes
Ok. This gives you what you want.
data have;
input PLACE $ 1-7 Method $ 10-11 DATE :ddmmyy10. NUMBER;
format date ddmmyy10.;
datalines;
PLACE A A 12/01/2021 712
PLACE A B 12/01/2021 712
PLACE A A 13/01/2021 713
PLACE A B 13/01/2021 73
PLACE A A 14/01/2021 75
PLACE A B 14/01/2021 7
PLACE A A 15/01/2021 75
PLACE A B 15/01/2021 112
PLACE B A 12/01/2021 8
PLACE B B 12/01/2021 16
PLACE B A 13/01/2021 85
PLACE B B 13/01/2021 15
PLACE B A 14/01/2021 85
PLACE B B 14/01/2021 1
PLACE B A 15/01/2021 86
PLACE B B 15/01/2021 16
PLACE C A 12/01/2021 188
PLACE C B 12/01/2021 345
PLACE C A 13/01/2021 188
PLACE C B 13/01/2021 67
PLACE C A 14/01/2021 188
PLACE C B 14/01/2021 67
PLACE C A 15/01/2021 188
PLACE C B 15/01/2021 67
;
data want;
if _N_ = 1 then do;
dcl hash h();
h.definekey('place', 'method');
h.definedata('number', 'date');
h.definedone();
end;
set have;
by place;
if first.place then do;
a = 0; b = 0;
end;
n = number;
d = date;
if h.find() ne 0 then new = 0;
else do;
if n > 10 & number <= 10 then new = d - date;
date = d;
number = n;
end;
h.replace();
number = n;
date = d;
if method = "A" then a = max(a, new);
if method = "B" then b = max(b, new);
if last.place;
retain a b;
keep place a b;
run;
Result:
PLACE a b PLACE A 0 1 PLACE B 1 1 PLACE C 0 0
You may need to provide a more complex example. There are some questions that you need to answer or demonstrate in your data.
1) Do you have more than one value for place and should the process start over with the "earliest date" when a new Place is encountered
2) is your data sorted by Place and date?
3) Does the first record/"earliest date" for Place always have a 0? Always have a value less than 10?
4) Does the value of number decrease and then increase? If the value of number decreases to below 10 might that become an "earliest date"
5) Is your date variable a SAS date value, i.e. a numeric variable with a Date format assigned?
6) What value should days have, and on which records, if Number never goes over 10?
7) Is the value of Days only to appear on one record? More than 1? If more than 1 which records?
Do you have more than one plane? Can there be gaps in your data?
Please be more specific in your requirements and provide exactly what your resulting data should look like
This is a part of my original data
The numbers is the capacity of available time of test
in this case all numbers are over 10 then from theres a waiting time of test at 0 days
PLACE | Method | DATE | NUMBER | new column |
T Covid-19 Test place | A | 12/01/2021 | 87 | 0 days |
T Covid-19 Test place | B | 12/01/2021 | 62 | |
T Covid-19 Test place | A | 13/01/2021 | 100 | |
T Covid-19 Test place | B | 13/01/2021 | 77 | |
T Covid-19 Test place | A | 14/01/2021 | 188 | |
T Covid-19 Test place | B | 14/01/2021 | 345 | |
T Covid-19 Test place | A | 15/01/2021 | 188 | |
T Covid-19 Test place | B | 15/01/2021 | 67 |
if the case is like this:
PLACE | Method | DATE | NUMBER | new column |
T Covid-19 Test place | A | 12/01/2021 | 7 | |
T Covid-19 Test place | B | 12/01/2021 | 7 | |
T Covid-19 Test place | A | 13/01/2021 | 8 | |
T Covid-19 Test place | B | 13/01/2021 | 1 | |
T Covid-19 Test place | A | 14/01/2021 | 188 | 2 days |
T Covid-19 Test place | B | 14/01/2021 | 345 | |
T Covid-19 Test place | A | 15/01/2021 | 188 | |
T Covid-19 Test place | B | 15/01/2021 | 67 |
ithe a case could also be like this:
PLACE | Method | DATE | NUMBER | new column |
T Covid-19 Test place | A | 12/01/2021 | 1 | |
T Covid-19 Test place | B | 12/01/2021 | 3 | |
T Covid-19 Test place | A | 13/01/2021 | 78 | 1 day |
T Covid-19 Test place | B | 13/01/2021 | 77 | |
T Covid-19 Test place | A | 14/01/2021 | 188 | |
T Covid-19 Test place | B | 14/01/2021 | 1 | 1 day |
T Covid-19 Test place | A | 15/01/2021 | 1119 | |
T Covid-19 Test place | B | 15/01/2021 | 67 |
1) There is always one value for each place and it should always count from the earliest date.
2) my data is sorted by date
3)The first record do not always have the value 0 - it was just to illustrate that if theres is a values less than 10 from the earliest date until the 3rd date then the new column will say 2 days - as it took to days for that place to have a value over 10
4)the counting should always start from the earliest date - if the earliest date is over 10 from the beginning then the new column should say 0 days etc.
5) my date format is actually DDMMYY10.
6) for a place that have under 10 all days then the column should says "NO DAYS AVAILABLE" - if the place has over 10 all days then it will be "0 days"
7) the days should appear every new case as shown in table 3
And if there are gaps or no values then "No days available"
Hope it is understandable
It appears that METHOD is involved in your calculation. Can you state the rule(s) involved?
There can be different methods for th same place - is should actually also consider that in the calculations - i forgot to mention that.
so the "days" label should be stated for each method
so if:
PLACE | METHOD | DATE | number | DAYS |
T covid-19 test place | A | 12/01/2021 | 12 | 0 days |
T covid-19 test place | B | 12/01/2021 | 67 | 0 days |
i
I'm still not quite sure here. You say: "1) There is always one value for each place and it should always count from the earliest date.".
Does that mean you have multiple different values for Place? I.e. Could place both be "T covid-19 test place" and "S covid-19 test place" ?
I have multiple places - and different places - them same place has a method A and B
each place goes from current date and 2 weeks ahead
when a specific place for method A has a number over 10 for all dates then it means this place has 0 days waiting time
if it has for method B for all dates a number below 10 then this place for method B has no available time.
If the cases is that method A starts with a number under 10 on th 12th and 13th, but then becomes over 10 at the 14th then the places has a waiting time at 2 days
but there are multiple places where the earliest date is the current date for each new place
An no each places has different names
Ok. And why is the new column = 1 in the third-last row of your last example?
I thought number should be > 10 for the new column to have a value?
Sorry typing mistake.. the 1 day should be on the next day when its over 10 🙂 so at number = 67
If a place has over 10 at the 12th, then under 10 at the 13 th, and then over 10 at the 14th and then under 10 for the rest of the days - then the waiting time will 1 day.
The definition is: waitingtime until next day where the number is at least 10
Since you seem to want to treat the methods separately, you need to sort by method before date. Try this:
data have;
infile datalines dlm="," dsd truncover;
input place :$30. method :$1. date :ddmmyy10. number;
format date yymmdd10.;
datalines;
T Covid-19 Test place,A,12/01/2021,7
T Covid-19 Test place,B,12/01/2021,7
T Covid-19 Test place,A,13/01/2021,8
T Covid-19 Test place,B,13/01/2021,1
T Covid-19 Test place,A,14/01/2021,188
T Covid-19 Test place,B,14/01/2021,345
T Covid-19 Test place,A,15/01/2021,188
T Covid-19 Test place,B,15/01/2021,67
;
proc sort data=have;
by place method date;
run;
data want;
set have;
by place method;
retain s_date;
if first.method then s_date = .;
if s_date = . and number lt 10 then s_date = date;
if s_date ne . and number ge 10
then do;
days = date - s_date;
s_date = .;
end;
drop s_date;
run;
Note how source data is supplied in a data step with datalines, so everyone can create the dataset exactly as is with a simple copy/paste and submit. It also leaves no doubts about dataset structure and contents.
See if this helps you
data have;
input PLACE $ 1-21 Method $ 23-24 DATE :ddmmyy10. NUMBER;
format date ddmmyy10.;
datalines;
T Covid-19 Test place A 12/01/2021 7
T Covid-19 Test place B 12/01/2021 7
T Covid-19 Test place A 13/01/2021 8
T Covid-19 Test place B 13/01/2021 1
T Covid-19 Test place A 14/01/2021 188
T Covid-19 Test place B 14/01/2021 345
T Covid-19 Test place A 15/01/2021 188
T Covid-19 Test place B 15/01/2021 67
;
data want;
if _N_ = 1 then do;
dcl hash h();
h.definekey('place', 'method');
h.definedata('number', 'date');
h.definedone();
end;
set have;
n = number;
d = date;
if h.find() ne 0 then do;
new = 0;
h.replace();
end;
else do;
if n > 10 & number <= 10 then do;
new = d - date;
date = d;
number = n;
h.replace();
end;
end;
number = n;
date = d;
keep PLACE Method DATE NUMBER new;
run;
Just made a small correction.
data have;
input PLACE $ 1-21 Method $ 23-24 DATE :ddmmyy10. NUMBER;
format date ddmmyy10.;
datalines;
T Covid-19 Test place A 12/01/2021 7
T Covid-19 Test place B 12/01/2021 7
T Covid-19 Test place A 13/01/2021 8
T Covid-19 Test place B 13/01/2021 1
T Covid-19 Test place A 14/01/2021 188
T Covid-19 Test place B 14/01/2021 345
T Covid-19 Test place A 15/01/2021 188
T Covid-19 Test place B 15/01/2021 67
;
data want;
if _N_ = 1 then do;
dcl hash h();
h.definekey('place', 'method');
h.definedata('number', 'date');
h.definedone();
end;
set have;
n = number;
d = date;
if h.find() ne 0 then new = 0;
else do;
if n > 10 & number <= 10 then new = d - date;
date = d;
number = n;
end;
h.replace();
number = n;
date = d;
keep PLACE Method DATE NUMBER new;
run;
Result:
PLACE Method DATE NUMBER new T Covid-19 Test place A 12/01/2021 7 0 T Covid-19 Test place B 12/01/2021 7 0 T Covid-19 Test place A 13/01/2021 8 . T Covid-19 Test place B 13/01/2021 1 . T Covid-19 Test place A 14/01/2021 188 1 T Covid-19 Test place B 14/01/2021 345 1 T Covid-19 Test place A 15/01/2021 188 . T Covid-19 Test place B 15/01/2021 67 .
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.