Solved: Generate new difficult variable that count observations based on locat...

Mikkel_madsen · Posted 04-20-2020 03:31 AM

Hi,

I'm quite new to SAS and have a complicated question. Hope someone is able to help me out.

Imagine this fictive dataset lets call it "Main_dataset":

ID Longitude Lattitude Day

1 12.21236548 55.14578965 15FEB2017

2 12.52319752 55.21460198 19FEB2017

3 12.41219754 56.41301547 21APR2017

4 12.14016597 55.49783265 05JUN2018

5 12.13497326 55.65794325 21DEC2018

6 12.52319632 55.21446125 26DEC2018

7 12.41659721 56.41302146 21APR2019

8 12.14316592 55.49789875 05JUN2019

9 12.13478134 55.65795469 21DEC2019

Then I have three other datasets (that are quite similar) lets call them "Observations_2017", "Observations_2018", and "Observations_2019". The only difference, is that the observations in each of them at limited to that year (2017, 2018, 2019).

Lets look at one of them fx "Observations_2017":

Day Longitude Lattitude

01FEB2017 12.62315456 55.46532659

21FEB2017 12.65312348 55.64325689

05MAR2017 12.54693265 55.21302356

12JUN2017 12.06837264 55.21363221

So.. What I want is to generate three new variables in my "Main_dataset" called "0_500", "500_1000" and "1000_1500". If we take first observation in the "Main_dataset" with ID = 1, then I want the three new variables to count how many observations from the "Observations_2017" (because the variable "Day" in the "Main_dataset" is 2017) that took place in 0-500, 500-1000, and 1000-1500 meters away from the coordinates in ID = 1. It should only look for events that happened at or before 15FEB2017 (the "Day" in ID = 1). In this fictive dataset, I'm not sure, that the variables actually are within these distances from each other. True line distance would be preferred, but straight line is also acceptable if true line is impossible. If the "Day" in the "Main_dataset" are from 2018 or 2019 the new variables should be generated from "Observations_2018" or "Observations_2019) respectively.

I can't get my mind around this. Hope someone can help, and that the question is understandable.

Thanks a lot!

FreelanceReinh · Posted 04-22-2020 05:35 AM

You're welcome.

The error messages clearly indicate that event_northing and event_easting are character variables, whereas the GEODIST function requires numeric arguments. You could do the conversion "on the fly" in the PROC SQL step by replacing a.event_northing by input(event_northing, 32.)and similarly for a.event_easting. (As mentioned earlier, you don't need the aliases since there are no name conflicts.) But I would rather create the two new numeric variables in a preliminary step as shown in the example below and use the newly created dataset in the PROC SQL step (i.e., Ultimate_database in place of Final_database). If Final_database is a huge dataset that you wouldn't want to duplicate just for the sake of a variable conversion, we can discuss other options.

/* Create test data */

data Final_database;
input ID event_northing :$20. event_easting :$20. Day :date9.;
format day date9.;
cards;
1 12.21236548 55.14578965 15FEB2017
;

/* Create numeric variables for northing and easting */

data Ultimate_database;
set Final_database(rename=(event_northing=event_northing_char event_easting=event_easting_char));
event_northing=input(event_northing_char, 32.);
event_easting=input(event_easting_char, 32.);
drop event_northing_char event_easting_char; /* <-- optional: remove the character versions */
run;

Similar considerations apply to the Time and eventtime variables. The latter could be converted in the above DATA step, the former in the DATA step creating Data_komplet. The details of the datetime conversions depend on the variable types of Time and eventtime (i.e., character or numeric, please see PROC CONTENTS output of the corresponding datasets) and, if numeric, on the kind of values. Moreover, the first argument of the INTNX function calls must be changed to 'dtyear' if the second argument is a SAS datetime, not a SAS date.

View solution in original post

FreelanceReinh · Posted 04-20-2020 05:53 AM

Hi @Mikkel_madsen and welcome to the SAS Support Communities!

I would (temporarily) combine the three "Observations" datasets and then apply PROC SQL, using the GEODIST function (which returns the geodetic distance, if that is acceptable).

Example:

/* Create test data */

data main;
input ID Longitude Latitude Day :date9.;
format day date9.;
cards;
1 12.21236548 55.14578965 15FEB2017
2 12.52319752 55.21460198 19FEB2017
3 12.41219754 56.41301547 21APR2017
4 12.14016597 55.49783265 05JUN2018
5 12.13497326 55.65794325 21DEC2018
6 12.52319632 55.21446125 26DEC2018
7 12.41659721 56.41302146 21APR2019
8 12.14316592 55.49789875 05JUN2019
9 12.13478134 55.50795469 21DEC2019
;

data Observations_2017;
input Day :date9. Longitude Latitude;
format day date9.;
cards;
01FEB2017 12.21175456 55.14562659
17FEB2017 12.21175456 55.14562659
05MAR2017 12.42293265 56.42302356
12JUN2017 12.06837264 55.21363221
;

data Observations_2018;
input Day :date9. Longitude Latitude;
format day date9.;
cards;
01FEB2018 12.14045456 55.49432659
21FEB2018 12.14052348 55.50925689
05MAR2018 12.53193265 55.21302356
12JUN2018 12.06837264 55.21363221
;

data Observations_2019;
input Day :date9. Longitude Latitude;
format day date9.;
cards;
01FEB2019 12.41615456 55.46532659
21FEB2019 12.15112348 55.48725689
05MAR2019 12.13393265 55.49902356
21DEC2019 12.13637264 55.50263221
;

/* Combine "Observations" datasets */

data obs_allyears;
set observations_20:;
run;

/* Derive the new variables D0_500, D500_1000 and D1000_1500 */

proc sql;
create table want as
select a.*, 
(select count(*) from obs_allyears b where .z<intnx('year',a.day,0)<=b.day<=a.day 
   & 0<=geodist(a.latitude, a.longitude, b.latitude, b.longitude)<=0.5) as d0_500,
(select count(*) from obs_allyears b where .z<intnx('year',a.day,0)<=b.day<=a.day 
   & 0.5<geodist(a.latitude, a.longitude, b.latitude, b.longitude)<=1)  as d500_1000,
(select count(*) from obs_allyears b where .z<intnx('year',a.day,0)<=b.day<=a.day 
   & 1<geodist(a.latitude, a.longitude, b.latitude, b.longitude)<=1.5)  as d1000_1500
from main a;
quit;

Mikkel_madsen · Posted 04-21-2020 08:21 AM

Hi @FreelanceReinh

Thanks a lot for your time! It does seem like the GEODIST function could work. I do, however, have some problems. I have with succes made one combined dataset for the three years (here called "obs_allyears). Does the "a.latitude" and "b.latitude" refer to the latitude from the "main_dataset" and from the "obs_allyears" respectively? Or is it the other way around? The same for "a.day" and "b.day". The variables in my real datasets (both the main and the combined) does not have the same name ('day') in both of them, so I'm getting a lot of error-messages.

Hope to hear from you once again, and thank your a lot!

All best

FreelanceReinh · Posted 04-21-2020 10:25 AM

@Mikkel_madsen wrote:

Does the "a.latitude" and "b.latitude" refer to the latitude from the "main_dataset" and from the "obs_allyears" respectively?

Exactly. The letter after the table name in the FROM clause (e.g. "... from main a") defines that letter as an alias for the table name. Thus it's possible to distinguish between variables with the same name, but from different tables (i.e. SAS datasets), like a.day and b.day. Longer aliases or the original table names (without defining aliases) are admissible as well, e.g. main.day, obs_allyears.day.

@Mikkel_madsen wrote:

The variables in my real datasets (both the main and the combined) does not have the same name ('day') in both of them, so I'm getting a lot of error-messages.

Then just use the respective names. If a variable name is unique, you don't need an alias for the distinction. For example, if, say, obsdate (rather than day) was the date variable in obs_allyears, you could simply write

.z<intnx('year',day,0)<=obsdate<=day

in the WHERE clauses.

(The alias "a." in "select a.*" is redundant anyway and could be omitted, because the corresponding FROM clause refers just to a single dataset.)

Mikkel_madsen · Posted 04-22-2020 04:27 AM

Thank you for being so kind and helpful @FreelanceReinh.

I have changed the variable names with the ones used in the real dataset.

So far so good. Now it look like:

proc sql;
create table Data_komplet_Geo as
select a.*,
(select count(*) from Data_komplet b where intnx('year',a.eventtime,0)<=b.Time<=a.eventtime
& 0<=geodist(a.event_northing, a.event_easting, b.latitude, b.longitude)<=0.5) as d0_500,
(select count(*) from Data_komplet b where intnx('year',a.eventtime,0)<=b.Time<=a.eventtime
& 0.5<geodist(a.event_northing, a.event_easting, b.latitude, b.longitude)<=1) as d500_1000,
(select count(*) from Data_komplet b where intnx('year',a.eventtime,0)<=b.Time<=a.eventtime
& 1<geodist(a.event_northing, a.event_easting, b.latitude, b.longitude)<=1.5) as d1000_1500
from Final_database a;
quit;

Unfortunately, I still get a lot of error messages (the following):

ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than or equal (<=) has components that are of different data types.
ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than or equal (<=) has components that are of different data types.
ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than (<) has components that are of different data types.
ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than or equal (<=) has components that are of different data types.
ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than (<) has components that are of different data types.
ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than or equal (<=) has components that are of different data types.

I'm not sure exactly why. My 'a.eventtime' is stored as: 14MAY201813:15:10 (May 14th 2018, with the last 13:15:10 being time of day, in hours:min:sec). My 'b.Time' is stored as: 30DEC1716:31:00 (again, December 30th 2017 at 16:31:00 in hour:min:sec). So there is a difference in how the year is stored (2018 vs just 17). Is that the problem? Or maybe the time of day (the hour:min:sec)?

Thank you for your patience.

FreelanceReinh · Posted 04-22-2020 05:35 AM

You're welcome.

The error messages clearly indicate that event_northing and event_easting are character variables, whereas the GEODIST function requires numeric arguments. You could do the conversion "on the fly" in the PROC SQL step by replacing a.event_northing by input(event_northing, 32.)and similarly for a.event_easting. (As mentioned earlier, you don't need the aliases since there are no name conflicts.) But I would rather create the two new numeric variables in a preliminary step as shown in the example below and use the newly created dataset in the PROC SQL step (i.e., Ultimate_database in place of Final_database). If Final_database is a huge dataset that you wouldn't want to duplicate just for the sake of a variable conversion, we can discuss other options.

/* Create test data */

data Final_database;
input ID event_northing :$20. event_easting :$20. Day :date9.;
format day date9.;
cards;
1 12.21236548 55.14578965 15FEB2017
;

/* Create numeric variables for northing and easting */

data Ultimate_database;
set Final_database(rename=(event_northing=event_northing_char event_easting=event_easting_char));
event_northing=input(event_northing_char, 32.);
event_easting=input(event_easting_char, 32.);
drop event_northing_char event_easting_char; /* <-- optional: remove the character versions */
run;

Similar considerations apply to the Time and eventtime variables. The latter could be converted in the above DATA step, the former in the DATA step creating Data_komplet. The details of the datetime conversions depend on the variable types of Time and eventtime (i.e., character or numeric, please see PROC CONTENTS output of the corresponding datasets) and, if numeric, on the kind of values. Moreover, the first argument of the INTNX function calls must be changed to 'dtyear' if the second argument is a SAS datetime, not a SAS date.

Mikkel_madsen · Posted 04-22-2020 08:59 AM

It seems like that converting 'on the fly' with input(event_norting,32.) and input(event_easting,32.) solved the problem! The program runs without any errors now, so I guess there is no problem with the dates.

Thank you very much, @FreelanceReinh. I would have been stuck for weeks without your quick help! Hope I may continue this thread should I encounter other problems in relation to this topic.

Have a nice day.

FreelanceReinh · Posted 04-22-2020 10:24 AM

Glad to hear that my suggestions worked for you.

I think each of the following conditions must be met so that indeed "there is no problem with the dates." I would definitely check this.

Both Time and eventtime are numeric variables in their respective datasets.
When you saw human readable values such as 14MAY2018:13:15:10 (typically with a colon or a "T" between year and hour), you applied a format suitable for SAS datetime values, e.g. datetime19.. Most likely, the format would be permanently associated with the Time or eventtime variable (see PROC CONTENTS output).
Typical unformatted values of these variables are about 1,800,000,000 for datetimes from 2017-2019, because '01JAN2017:00:00'dt=1798848000 and '01JAN2020:00:00'dt=1893456000. You can check these values with PROC MEANS, e.g.:
```
proc means data=Data_komplet min median max;
var Time;
run;
```
(and similarly for the other dataset).
You used 'dtyear', not 'year' as the time interval in the INTNX function calls, as I suggested.

And, yes, of course, feel free to ask if new questions arise (which are so close to the topic of the current thread that opening a new one would seem inappropriate).

Mikkel_madsen · Posted 04-23-2020 02:23 PM

Hi again @FreelanceReinh

I have checked the following and can add:

1) When I look in the original datasets (Data_komplet and Final_database), the 'eventtime' (from Final_database) is numeric with the format datetime19. while the 'time' (from Data_komplet) is numeric with the format datetime18. (I rightclick on the variable name and press 'properties' where the type and format e.g. numeric and datetime19. is shown. The same is found when using PROC CONTETNS).

Is that ok? Or should 'time' and 'eventtime' have the same format (e.g. datetime19. or datetime18.)?

2) In both datasets I see human readable values (in Data_komplet e.g. 12MAY0714:40:00 and in Final_database e.g. 01JUN2018:10:46:45 - that is, there is a difference the datasets between with a "missing" colon between the year and time in Data_komplet and a "missing" 20 in year in Data_komplet. I am not sure whether this will hinder analyses.

3) In Final_database the median is 1838539767 while the median in Data_komplet is 1789633440 (though I guess this is because I have observations older than 2017 in the Data_komplet dataset e.g. from 2007).

4) I changed the 'year' to 'dtyear' it did, however, result in a significant drop in observations in d0_500, d500_1000, and d1000_1500 ?

I know I must sound a bit confused, but I'm very thankful for your enormous help! I might have some extra questions and wishes to the program, but I will focuse on the problems above first. Once again, thank you!

FreelanceReinh · Posted 04-23-2020 05:09 PM

Hi @Mikkel_madsen,

Thanks for checking. Here are my comments on items 1 - 4:

Numeric is good. Formats have no impact on calculations (because SAS uses the internal values), so the (minor) difference between datetime19. and datetime18. is irrelevant for our purposes.
The formats produce the human-readable values. I'm surprised about the missing colon between date and time in your reported datetime18. value. This omission does not occur with my SAS 9.4 and I have never seen this before. Nevertheless, it would be irrelevant for the analysis.
These median values look plausible. Make sure that this is true for the extreme values (minimum and maximum) as well, i.e., no datetime values from, say, 1960 or 2099 due to data errors.
Well done! The higher numbers in d0_500 etc. obtained with 'year' must have been plain wrong! See the example and further explanations below.

Example:

data test;
input dt;
dt0=intnx('dtyear',dt,0); /* correct */
dtx=intnx('year',dt,0); /* incorrect */
cards;
1838539767
.
;

proc print data=test;
format dt: datetime18.;
run;

Log of the DATA step:

1164  data test;
1165  input dt;
1166  dt0=intnx('dtyear',dt,0); /* correct */
1167  dtx=intnx('year',dt,0); /* incorrect */
1168  cards;

NOTE: Invalid argument to function INTNX('year',1838539767,0) at line 1167 column 5.
RULE:       ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1169        1838539767
dt=1838539767 dt0=1830384000 dtx=. _ERROR_=1 _N_=1
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 1166:5   1 at 1167:5
NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 1167:5
NOTE: The data set WORK.TEST has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

The invalid combination of 'year' in the first and a datetime value (more precisely: a value too large for a SAS date value) in the second argument of INTNX causes the first and the third note in the log and also the black text between the first and the second note. The second note is triggered by the missing value in the second observation, which I added for demonstration.

Output of PROC PRINT (btw, note the colon between date and time):

Obs                    dt                   dt0                   dtx

 1       05APR18:09:29:27      01JAN18:00:00:00                     .
 2                      .                     .                     .

The incorrect interval specification 'year' leads to a missing value of dtx, as does (less surprisingly) the missing value in dt itself, whereas dt0 contains the intended start-of-the-year datetime value for non-missing dt.

PROC SQL tends to be less verbose in terms of relevant log messages:

proc sql;
select intnx('dtyear',dt,0) as dt0,
       intnx('year',dt,0)   as dtx
from test;
quit;

Log:

1176  proc sql;
1177  select intnx('dtyear',dt,0) as dt0,
1178         intnx('year',dt,0)   as dtx
1179  from test;
NOTE: Invalid argument to function INTNX. Missing values may be generated.
NOTE: Invalid (or missing) arguments to the INTNX function have caused the function to return a missing value.
1180  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Only a one-line note for each of the two issues.

However, a WHERE clause in PROC SQL or likewise a WHERE statement in a DATA step or a WHERE= dataset option do not write any suspicious note about these data/code issues to the log!

data sel;
set test;
where intnx('year',dt,0)<='01JAN2017:00:00'dt;
run;

Log:

1181  data sel;
1182  set test;
1183  where intnx('year',dt,0)<='01JAN2017:00:00'dt;
1184  run;

NOTE: There were 2 observations read from the data set WORK.TEST.
      WHERE INTNX('year', dt, 0)<='  01JAN2017:00:00:00'DT;
NOTE: The data set WORK.SEL has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Both observations satisfy the WHERE condition, because in both cases intnx('year',dt,0) results in a missing value and missing values are considered smaller than non-missing values. With the correct 'dtyear' the first observation (the datetime from 2018) would clearly not have been selected.

So, with the incorrect 'year' interval the first part of the WHERE condition in your program (the first "<=" inequality) was effectively deactivated (namely always true, regardless of the datetime values), hence the additional counts in d0_500 etc. It's always shocking to see how such mistakes can corrupt results while the SAS log is clean.

As a precaution it is advisable to exclude missing values in WHERE or IF conditions on the "less than" side of comparison operators. In your example:

... where .z<intnx('dtyear',a.eventtime,0)<=b.Time<=a.eventtime ...

(using the largest special missing value .z, but in most situations the ordinary missing value . would be sufficient). This would also prevent the WHERE condition from being satisfied if both of the two datetime values were missing.

I have inserted the ".z<" into my earlier posts of this thread in order to demonstrate this defensive programming style, also to later readers.

Mikkel_madsen · Posted 04-24-2020 10:53 AM

Hi again @FreelanceReinh

Thank you for your thorough walkthrough, I think that I can follow however, it's a bit complicated.

I guess that everything works correctly now. I'm using SAS Enterprise Guide v. 9.1, maybe that's why there's a missing colon in the dates? However, if it does not matte for the analysis to be done correctly then I guess it's okay anyway.

I have also inserted the ".<z" in the program.

Thank you for your kindness and help! Have a nice weekend!

Mikkel_madsen · Posted 06-23-2020 06:00 AM

Hi @FreelanceReinh

I hope you are well and that I might may request some further help from you.

You helped me generate this programme:

proc sql;
create table Data_komplet_Geo as
select a.*, 
(select count(*) from Data_komplet b where .z<intnx('dtyear',a.eventtime,0)<=b.createdtime<=a.eventtime 
   & 0<=geodist(input(event_northing,32.), input(event_easting,32.), b.latitude, b.longitude)<=0.360) as d0_360,
(select count(*) from Data_komplet b where .z<intnx('dtyear',a.eventtime,0)<=b.createdtime<=a.eventtime
   & 0.360<geodist(input(event_northing,32.), input(event_easting,32.), b.latitude, b.longitude)<=1.08)  as d360_1080,
(select count(*) from Data_komplet b where .z<intnx('dtyear',a.eventtime,0)<=b.createdtime<=a.eventtime
   & 1.08<geodist(input(event_northing,32.), input(event_easting,32.), b.latitude, b.longitude)<=1.8)  as d1080_1800
from Final_database a;
quit;

It worked well, but unfortunately I would like some more. You might remember my two datasets "Final_database" and the three datasets I joined to "Data_komplet". They looked like:

Final_database:

ID Longitude Lattitude Eventtime

1 12.21236548 55.14578965 15FEB201713:15:10

2 12.52319752 55.21460198 19FEB201717:12:10

3 12.41219754 56.41301547 21APR201711:03:11

4 12.14016597 55.49783265 05JUN201808:05:44

5 12.13497326 55.65794325 21DEC201815:59:44

6 12.52319632 55.21446125 26DEC201806:30:12

7 12.41659721 56.41302146 21APR201922:12:42

8 12.14316592 55.49789875 05JUN201902:22:23

9 12.13478134 55.65795469 21DEC2019

Data_komplet:

Createdtime Longitude Lattitude

01FEB1712:00:00 12.62315456 55.46532659

21FEB1708:00:00 12.65312348 55.64325689

05MAR1709:00:00 12.54693265 55.21302356

12JUN1712:00:00 12.06837264 55.21363221

My 'Data_komplet' database have changed a bit to:

Var Createdtime Longitude Lattitude Type Open Close 24/7

1 01FEB1712:00:00 12.62315456 55.46532659 Monday 08:00:00 22:00:00 N

1 01FEB1712:00:00 12.62315456 55.46532659 Tuesday 08:00:00 22:00:00 N

1 01FEB1712:00:00 12.62315456 55.46532659 Wednesday 08:00:00 22:00:00 N

1 01FEB1712:00:00 12.62315456 55.46532659 Thursday 08:00:00 22:00:00 N

1 01FEB1712:00:00 12.62315456 55.46532659 Friday 08:00:00 22:00:00 N

2 21FEB1708:00:00 12.65312348 55.64325689 Monday 12:00:00 22:00:00 N

2 21FEB1708:00:00 12.65312348 55.64325689 Tuesday 08:00:00 22:00:00 N

2 21FEB1708:00:00 12.65312348 55.64325689 Wednesday 10:00:00 22:00:00 N

2 21FEB1708:00:00 12.65312348 55.64325689 Thursday 08:00:00 22:00:00 N

3 05MAR1709:00:00 12.54693265 55.21302356 . . . Y

4 12JUN1712:00:00 12.06837264 55.21363221 . . . Y

So... The objective is the same: to count how many observations from 'Data_komplet' that lies within 0-360, 360-1080 and 1080-1800 meters from the observations in 'Final_database'. But it should only count those where eventtime <= createdtime. Further, it should only count those observations from 'Data_komplet' that are either 24/7 (in the first 'Data_komplet' where you helped me, all observations where 24/7) or those where the day in 'Type' matches the weekday of 'Eventtime' and where the time (from 'eventtime') is within the openinghours (between 'Open' and 'Close').

Phew. A bit complicated to explain! Hope that it is clear and that you have a suggestion - if it is possible!

All best,

MM

FreelanceReinh · Posted 06-23-2020 09:48 AM

Hi @Mikkel_madsen,

@Mikkel_madsen wrote:

So... The objective is the same: to count how many observations from 'Data_komplet' that lies within 0-360, 360-1080 and 1080-1800 meters from the observations in 'Final_database'. But it should only count those where eventtime <= createdtime. Further, it should only count those observations from 'Data_komplet' that are either 24/7 (in the first 'Data_komplet' where you helped me, all observations where 24/7) or those where the day in 'Type' matches the weekday of 'Eventtime' and where the time (from 'eventtime') is within the openinghours (between 'Open' and 'Close').

So you'll need to modify the WHERE conditions to something like the example below (and analogously for the other two distance ranges):

  .z<a.eventtime<=b.createdtime
& 0.360<geodist(input(event_northing,32.), input(event_easting,32.), b.latitude, b.longitude)<=1.08)
& (b.f247='Y' | put(datepart(a.eventtime),downame.-l)=left(b.type) & b.open<=timepart(a.eventtime)<=b.close)

The second line (check of the distance) is unchanged. The modified first line reflects your new criterion, which does no longer require eventtime and createdtime to be in the same year (i.e., now an event in, say, 2016 would also meet the condition with createdtime values in 2017, 2018, etc.).

The newly added third line contains the new criteria regarding "24/7" and opening hours/weekday. However, this part might need to be changed depending on your data structure:

Most likely, the real name of the "24/7" flag variable is not f247 (which is what I used as an example), so please change it correspondingly. Note that "24/7" is not a valid SAS variable name. Possibly the name is '24/7'n, but I would not recommend using these (so called) name literals. It's not even clear from your post whether this is a character variable or perhaps a formatted numeric variable (like 0='N', 1='Y'), which would require a different WHERE condition.
Similar ambiguities exist for variable Type. The code above assumes a character variable Type containing values like those shown in your post.
Also the last part of the condition involves an assumption I had to make: that variables Open and Close are numeric and contain SAS time values, i.e., numbers of seconds after midnight. If they were character variables (containing values like '08:00:00'), you would convert them "on the fly" like input(b.open,time8.) instead of b.open etc.

Note further that the incomplete eventtime value shown for ID 9 ("21DEC2019") cannot exist in a SAS datetime variable.

Mikkel_madsen · Posted 06-25-2020 04:43 AM

Hi @FreelanceReinh

Thank you for always being so helpful and kind!

I made a mistake with the createdtime and eventtime. It should only (as in the first programme) count eventtimes that took places after createdtime - so I left the first line unchanged (as in the old programme).

I have checked my variables:

1) The variable "247" (which is the correct name of the variable) is a yes/no character variable (in the format "$CHAR3.).

2) The "Type" variable is also a character variable (in the format $CHAR9.). The values in the "Type" variable is the seven different weekdays (e.g. Monday, tuesday etc.).

3) The variables "Open" and "Close" are both numeric variables with the format TIME8. and the informat TIME11.

Does it require any adjustments?

Thanks again! Have a nice day 🙂

All best MM

FreelanceReinh · Posted 06-25-2020 07:00 AM

Hi @Mikkel_madsen,

You're welcome.

@Mikkel_madsen wrote:

I have checked my variables:

1) The variable "247" (which is the correct name of the variable) is a yes/no character variable (in the format "$CHAR3.).

2) The "Type" variable is also a character variable (in the format $CHAR9.). The values in the "Type" variable is the seven different weekdays (e.g. Monday, tuesday etc.).

3) The variables "Open" and "Close" are both numeric variables with the format TIME8. and the informat TIME11.

Does it require any adjustments?

1) Only a name literal such as "247"n can start with a digit (but it's displayed without the quotes and the n). If this is the name of the variable and it has length 3 (see format length) because it contains values 'Yes' and 'No' (rather than 'Y' and 'N'), then, of course, the pertinent condition needs to be modified: Instead of

b.f247='Y'

you could use

upcase(b.'247'n) eqt 'Y'

This would work with values 'Yes', 'yes', 'YES', 'Y' and 'y' (left aligned in case of single characters; otherwise add the LEFT function), even with 'yep' etc.

2) The weekday criterion I proposed is case sensitive, i.e., 'Tuesday' would match, but not 'tuesday' or 'TUESDAY'. To be on the safe side, you may want to add the UPCASE function to both sides of the condition:

upcase(put(datepart(a.eventtime),downame.-l))=upcase(left(b.type))

3) The time variable metadata look good.

Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Re: Generate new difficult variable that count observations based on location

Classroom Training Available!