BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mikkel_madsen
Obsidian | Level 7

Hi,

 

I'm quite new to SAS and have a complicated question. Hope someone is able to help me out.

 

Imagine this fictive dataset lets call it "Main_dataset":

 

ID     Longitude          Lattitude           Day

1     12.21236548     55.14578965     15FEB2017

2     12.52319752     55.21460198     19FEB2017

3     12.41219754     56.41301547     21APR2017

4     12.14016597     55.49783265     05JUN2018

5     12.13497326     55.65794325     21DEC2018

6     12.52319632     55.21446125     26DEC2018

7     12.41659721     56.41302146     21APR2019

8     12.14316592     55.49789875     05JUN2019

9     12.13478134     55.65795469     21DEC2019

 

Then I have three other datasets (that are quite similar) lets call them "Observations_2017", "Observations_2018", and "Observations_2019". The only difference, is that the observations in each of them at limited to that year (2017, 2018, 2019). 

Lets look at one of them fx "Observations_2017":

 

Day                  Longitude          Lattitude

01FEB2017     12.62315456     55.46532659

21FEB2017     12.65312348     55.64325689

05MAR2017    12.54693265     55.21302356

12JUN2017     12.06837264     55.21363221

 

So.. What I want is to generate three new variables in my "Main_dataset" called "0_500", "500_1000" and "1000_1500". If we take first observation in the "Main_dataset" with ID = 1, then I want the three new variables to count how many observations from the "Observations_2017" (because the variable "Day" in the "Main_dataset" is 2017) that took place in 0-500, 500-1000, and 1000-1500 meters away from the coordinates in ID = 1. It should only look for events that happened at or before 15FEB2017 (the "Day" in ID = 1). In this fictive dataset, I'm not sure, that the variables actually are within these distances from each other. True line distance would be preferred, but straight line is also acceptable if true line is impossible. If the "Day" in the "Main_dataset" are from 2018 or 2019 the new variables should be generated from "Observations_2018" or "Observations_2019) respectively. 

 

I can't get my mind around this. Hope someone can help, and that the question is understandable. 

 

Thanks a lot! 

                

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

You're welcome.

 

The error messages clearly indicate that event_northing and event_easting are character variables, whereas the GEODIST function requires numeric arguments. You could do the conversion "on the fly" in the PROC SQL step by replacing a.event_northing by input(event_northing, 32.)and similarly for a.event_easting. (As mentioned earlier, you don't need the aliases since there are no name conflicts.) But I would rather create the two new numeric variables in a preliminary step as shown in the example below and use the newly created dataset in the PROC SQL step (i.e., Ultimate_database in place of Final_database). If Final_database is a huge dataset that you wouldn't want to duplicate just for the sake of a variable conversion, we can discuss other options.

/* Create test data */

data Final_database;
input ID event_northing :$20. event_easting :$20. Day :date9.;
format day date9.;
cards;
1 12.21236548 55.14578965 15FEB2017
;

/* Create numeric variables for northing and easting */

data Ultimate_database;
set Final_database(rename=(event_northing=event_northing_char event_easting=event_easting_char));
event_northing=input(event_northing_char, 32.);
event_easting=input(event_easting_char, 32.);
drop event_northing_char event_easting_char; /* <-- optional: remove the character versions */
run;

Similar considerations apply to the Time and eventtime variables. The latter could be converted in the above DATA step, the former in the DATA step creating Data_komplet. The details of the datetime conversions depend on the variable types of Time and eventtime (i.e., character or numeric, please see PROC CONTENTS output of the corresponding datasets) and, if numeric, on the kind of values. Moreover, the first argument of the INTNX function calls must be changed to 'dtyear' if the second argument is a SAS datetime, not a SAS date.

View solution in original post

21 REPLIES 21
FreelanceReinh
Jade | Level 19

Hi @Mikkel_madsen and welcome to the SAS Support Communities!

 

I would (temporarily) combine the three "Observations" datasets and then apply PROC SQL, using the GEODIST function (which returns the geodetic distance, if that is acceptable).

 

Example:

/* Create test data */

data main;
input ID Longitude Latitude Day :date9.;
format day date9.;
cards;
1 12.21236548 55.14578965 15FEB2017
2 12.52319752 55.21460198 19FEB2017
3 12.41219754 56.41301547 21APR2017
4 12.14016597 55.49783265 05JUN2018
5 12.13497326 55.65794325 21DEC2018
6 12.52319632 55.21446125 26DEC2018
7 12.41659721 56.41302146 21APR2019
8 12.14316592 55.49789875 05JUN2019
9 12.13478134 55.50795469 21DEC2019
;

data Observations_2017;
input Day :date9. Longitude Latitude;
format day date9.;
cards;
01FEB2017 12.21175456 55.14562659
17FEB2017 12.21175456 55.14562659
05MAR2017 12.42293265 56.42302356
12JUN2017 12.06837264 55.21363221
;

data Observations_2018;
input Day :date9. Longitude Latitude;
format day date9.;
cards;
01FEB2018 12.14045456 55.49432659
21FEB2018 12.14052348 55.50925689
05MAR2018 12.53193265 55.21302356
12JUN2018 12.06837264 55.21363221
;

data Observations_2019;
input Day :date9. Longitude Latitude;
format day date9.;
cards;
01FEB2019 12.41615456 55.46532659
21FEB2019 12.15112348 55.48725689
05MAR2019 12.13393265 55.49902356
21DEC2019 12.13637264 55.50263221
;

/* Combine "Observations" datasets */

data obs_allyears;
set observations_20:;
run;

/* Derive the new variables D0_500, D500_1000 and D1000_1500 */

proc sql;
create table want as
select a.*, 
(select count(*) from obs_allyears b where .z<intnx('year',a.day,0)<=b.day<=a.day 
   & 0<=geodist(a.latitude, a.longitude, b.latitude, b.longitude)<=0.5) as d0_500,
(select count(*) from obs_allyears b where .z<intnx('year',a.day,0)<=b.day<=a.day 
   & 0.5<geodist(a.latitude, a.longitude, b.latitude, b.longitude)<=1)  as d500_1000,
(select count(*) from obs_allyears b where .z<intnx('year',a.day,0)<=b.day<=a.day 
   & 1<geodist(a.latitude, a.longitude, b.latitude, b.longitude)<=1.5)  as d1000_1500
from main a;
quit;
Mikkel_madsen
Obsidian | Level 7

Hi @FreelanceReinh 

 

Thanks a lot for your time! It does seem like the GEODIST function could work. I do, however, have some problems. I have with succes made one combined dataset for the three years (here called "obs_allyears). Does the "a.latitude" and "b.latitude" refer to the latitude from the "main_dataset" and from the "obs_allyears" respectively? Or is it the other way around? The same for "a.day" and "b.day". The variables in my real datasets (both the main and the combined) does not have the same name ('day') in both of them, so I'm getting a lot of error-messages. 

 

Hope to hear from you once again, and thank your a lot! 

 

All best

FreelanceReinh
Jade | Level 19

@Mikkel_madsen wrote:

Does the "a.latitude" and "b.latitude" refer to the latitude from the "main_dataset" and from the "obs_allyears" respectively? 


Exactly. The letter after the table name in the FROM clause (e.g. "... from main a") defines that letter as an alias for the table name. Thus it's possible to distinguish between variables with the same name, but from different tables (i.e. SAS datasets), like a.day and b.day. Longer aliases or the original table names (without defining aliases) are admissible as well, e.g. main.day, obs_allyears.day.

 


@Mikkel_madsen wrote:

The variables in my real datasets (both the main and the combined) does not have the same name ('day') in both of them, so I'm getting a lot of error-messages. 

Then just use the respective names. If a variable name is unique, you don't need an alias for the distinction. For example, if, say, obsdate (rather than day) was the date variable in obs_allyears, you could simply write

.z<intnx('year',day,0)<=obsdate<=day

in the WHERE clauses.

 

(The alias "a." in "select a.*" is redundant anyway and could be omitted, because the corresponding FROM clause refers just to a single dataset.)

Mikkel_madsen
Obsidian | Level 7

Thank you for being so kind and helpful @FreelanceReinh

 

I have changed the variable names with the ones used in the real dataset.

So far so good. Now it look like:

 

proc sql;
create table Data_komplet_Geo as
select a.*,
(select count(*) from Data_komplet b where intnx('year',a.eventtime,0)<=b.Time<=a.eventtime 
& 0<=geodist(a.event_northing, a.event_easting, b.latitude, b.longitude)<=0.5) as d0_500,
(select count(*) from Data_komplet b where intnx('year',a.eventtime,0)<=b.Time<=a.eventtime
& 0.5<geodist(a.event_northing, a.event_easting, b.latitude, b.longitude)<=1) as d500_1000,
(select count(*) from Data_komplet b where intnx('year',a.eventtime,0)<=b.Time<=a.eventtime
& 1<geodist(a.event_northing, a.event_easting, b.latitude, b.longitude)<=1.5) as d1000_1500
from Final_database a;
quit;

 

 

Unfortunately, I still get a lot of error messages (the following): 

 

ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than or equal (<=) has components that are of different data types.
ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than or equal (<=) has components that are of different data types.
ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than (<) has components that are of different data types.
ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than or equal (<=) has components that are of different data types.
ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than (<) has components that are of different data types.
ERROR: Function GEODIST requires a numeric expression as argument 1.
ERROR: Function GEODIST requires a numeric expression as argument 2.
ERROR: Expression using less than or equal (<=) has components that are of different data types.

 

I'm not sure exactly why. My 'a.eventtime' is stored as: 14MAY201813:15:10 (May 14th 2018, with the last 13:15:10 being time of day, in hours:min:sec). My 'b.Time' is stored as: 30DEC1716:31:00 (again, December 30th 2017 at 16:31:00 in hour:min:sec). So there is a difference in how the year is stored (2018 vs just 17). Is that the problem? Or maybe the time of day (the hour:min:sec)?

 

Thank you for your patience. 

FreelanceReinh
Jade | Level 19

You're welcome.

 

The error messages clearly indicate that event_northing and event_easting are character variables, whereas the GEODIST function requires numeric arguments. You could do the conversion "on the fly" in the PROC SQL step by replacing a.event_northing by input(event_northing, 32.)and similarly for a.event_easting. (As mentioned earlier, you don't need the aliases since there are no name conflicts.) But I would rather create the two new numeric variables in a preliminary step as shown in the example below and use the newly created dataset in the PROC SQL step (i.e., Ultimate_database in place of Final_database). If Final_database is a huge dataset that you wouldn't want to duplicate just for the sake of a variable conversion, we can discuss other options.

/* Create test data */

data Final_database;
input ID event_northing :$20. event_easting :$20. Day :date9.;
format day date9.;
cards;
1 12.21236548 55.14578965 15FEB2017
;

/* Create numeric variables for northing and easting */

data Ultimate_database;
set Final_database(rename=(event_northing=event_northing_char event_easting=event_easting_char));
event_northing=input(event_northing_char, 32.);
event_easting=input(event_easting_char, 32.);
drop event_northing_char event_easting_char; /* <-- optional: remove the character versions */
run;

Similar considerations apply to the Time and eventtime variables. The latter could be converted in the above DATA step, the former in the DATA step creating Data_komplet. The details of the datetime conversions depend on the variable types of Time and eventtime (i.e., character or numeric, please see PROC CONTENTS output of the corresponding datasets) and, if numeric, on the kind of values. Moreover, the first argument of the INTNX function calls must be changed to 'dtyear' if the second argument is a SAS datetime, not a SAS date.

Mikkel_madsen
Obsidian | Level 7

It seems like that converting 'on the fly' with input(event_norting,32.) and input(event_easting,32.) solved the problem! The program runs without any errors now, so I guess there is no problem with the dates. 

 

Thank you very much, @FreelanceReinh. I would have been stuck for weeks without your quick help! Hope I may continue this thread should I encounter other problems in relation to this topic. 

 

Have a nice day. 

FreelanceReinh
Jade | Level 19

Glad to hear that my suggestions worked for you.

 

I think each of the following conditions must be met so that indeed "there is no problem with the dates." I would definitely check this.

  1. Both Time and eventtime are numeric variables in their respective datasets.
  2. When you saw human readable values such as 14MAY2018:13:15:10 (typically with a colon or a "T" between year and hour), you applied a format suitable for SAS datetime values, e.g. datetime19.. Most likely, the format would be permanently associated with the  Time or eventtime variable (see PROC CONTENTS output).
  3. Typical unformatted values of these variables are about 1,800,000,000 for datetimes from 2017-2019, because '01JAN2017:00:00'dt=1798848000 and '01JAN2020:00:00'dt=1893456000. You can check these values with PROC MEANS, e.g.:
    proc means data=Data_komplet min median max;
    var Time;
    run;
    (and similarly for the other dataset).
  4. You used 'dtyear', not 'year' as the time interval in the INTNX function calls, as I suggested.

 

And, yes, of course, feel free to ask if new questions arise (which are so close to the topic of the current thread that opening a new one would seem inappropriate).

Mikkel_madsen
Obsidian | Level 7

Hi again @FreelanceReinh 

 

I have checked the following and can add:

 

1) When I look in the original datasets (Data_komplet and Final_database), the 'eventtime' (from Final_database) is numeric with the format datetime19. while the 'time' (from Data_komplet) is numeric with the format datetime18. (I rightclick on the variable name and press 'properties' where the type and format e.g. numeric and datetime19. is shown. The same is found when using PROC CONTETNS). 

 

Is that ok? Or should 'time' and 'eventtime' have the same format (e.g. datetime19. or datetime18.)?

 

2) In both datasets I see human readable values (in Data_komplet e.g. 12MAY0714:40:00 and in Final_database e.g. 01JUN2018:10:46:45 - that is, there is a difference the datasets between with a "missing" colon between the year and time in Data_komplet and a "missing" 20 in year in Data_komplet. I am not sure whether this will hinder analyses. 

 

3) In Final_database the median is 1838539767 while the median in Data_komplet is 1789633440 (though I guess this is because I have observations older than 2017 in the Data_komplet dataset e.g. from 2007). 

 

4) I changed the 'year' to 'dtyear' it did, however, result in a significant drop in observations in d0_500, d500_1000, and d1000_1500 ?

 

I know I must sound a bit confused, but I'm very thankful for your enormous help! I might have some extra questions and wishes to the program, but I will focuse on the problems above first. Once again, thank you! 

FreelanceReinh
Jade | Level 19

Hi @Mikkel_madsen,

 

Thanks for checking. Here are my comments on items 1 - 4:

  1. Numeric is good. Formats have no impact on calculations (because SAS uses the internal values), so the (minor) difference between datetime19. and datetime18. is irrelevant for our purposes.
  2. The formats produce the human-readable values. I'm surprised about the missing colon between date and time in your reported datetime18. value. This omission does not occur with my SAS 9.4 and I have never seen this before. Nevertheless, it would be irrelevant for the analysis.
  3. These median values look plausible. Make sure that this is true for the extreme values (minimum and maximum) as well, i.e., no datetime values from, say, 1960 or 2099 due to data errors.
  4. Well done! The higher numbers in d0_500 etc. obtained with 'year' must have been plain wrong! See the example and further explanations below.

Example:

data test;
input dt;
dt0=intnx('dtyear',dt,0); /* correct */
dtx=intnx('year',dt,0); /* incorrect */
cards;
1838539767
.
;

proc print data=test;
format dt: datetime18.;
run;

Log of the DATA step:

1164  data test;
1165  input dt;
1166  dt0=intnx('dtyear',dt,0); /* correct */
1167  dtx=intnx('year',dt,0); /* incorrect */
1168  cards;

NOTE: Invalid argument to function INTNX('year',1838539767,0) at line 1167 column 5.
RULE:       ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1169        1838539767
dt=1838539767 dt0=1830384000 dtx=. _ERROR_=1 _N_=1
NOTE: Missing values were generated as a result of performing an operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 1166:5   1 at 1167:5
NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 1167:5
NOTE: The data set WORK.TEST has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

The invalid combination of 'year' in the first and a datetime value (more precisely: a value too large for a SAS date value) in the second argument of INTNX causes the first and the third note in the log and also the black text between the first and the second note. The second note is triggered by the missing value in the second observation, which I added for demonstration.

 

Output of PROC PRINT (btw, note the colon between date and time):

Obs                    dt                   dt0                   dtx

 1       05APR18:09:29:27      01JAN18:00:00:00                     .
 2                      .                     .                     .

The incorrect interval specification 'year' leads to a missing value of dtx, as does (less surprisingly) the missing value in dt itself, whereas dt0 contains the intended start-of-the-year datetime value for non-missing dt.

 

PROC SQL tends to be less verbose in terms of relevant log messages:

proc sql;
select intnx('dtyear',dt,0) as dt0,
       intnx('year',dt,0)   as dtx
from test;
quit;

Log:

1176  proc sql;
1177  select intnx('dtyear',dt,0) as dt0,
1178         intnx('year',dt,0)   as dtx
1179  from test;
NOTE: Invalid argument to function INTNX. Missing values may be generated.
NOTE: Invalid (or missing) arguments to the INTNX function have caused the function to return a missing value.
1180  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Only a one-line note for each of the two issues.

 

However, a WHERE clause in PROC SQL or likewise a WHERE statement in a DATA step or a WHERE= dataset option do not write any suspicious note about these data/code issues to the log!

data sel;
set test;
where intnx('year',dt,0)<='01JAN2017:00:00'dt;
run;

Log:

1181  data sel;
1182  set test;
1183  where intnx('year',dt,0)<='01JAN2017:00:00'dt;
1184  run;

NOTE: There were 2 observations read from the data set WORK.TEST.
      WHERE INTNX('year', dt, 0)<='  01JAN2017:00:00:00'DT;
NOTE: The data set WORK.SEL has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Both observations satisfy the WHERE condition, because in both cases intnx('year',dt,0) results in a missing value and missing values are considered smaller than non-missing values. With the correct 'dtyear' the first observation (the datetime from 2018) would clearly not have been selected.

 

So, with the incorrect 'year' interval the first part of the WHERE condition in your program (the first "<=" inequality) was effectively deactivated (namely always true, regardless of the datetime values), hence the additional counts in d0_500 etc. It's always shocking to see how such mistakes can corrupt results while the SAS log is clean.

 

As a precaution it is advisable to exclude missing values in WHERE or IF conditions on the "less than" side of comparison operators. In your example:

... where .z<intnx('dtyear',a.eventtime,0)<=b.Time<=a.eventtime ...

(using the largest special missing value .z, but in most situations the ordinary missing value . would be sufficient). This would also prevent the WHERE condition from being satisfied if both of the two datetime values were missing.

 

I have inserted the ".z<" into my earlier posts of this thread in order to demonstrate this defensive programming style, also to later readers.

 

Mikkel_madsen
Obsidian | Level 7

Hi again  @FreelanceReinh 

 

Thank you for your thorough walkthrough, I think that I can follow however, it's a bit complicated. 

I guess that everything works correctly now. I'm using SAS Enterprise Guide v. 9.1, maybe that's why there's a missing colon in the dates? However, if it does not matte for the analysis to be done correctly then I guess it's okay anyway. 

 

I have also inserted the ".<z" in the program.

 

Thank you for your kindness and help! Have a nice weekend!

Mikkel_madsen
Obsidian | Level 7

Hi @FreelanceReinh 

 

I hope you are well and that I might may request some further help from you. 

You helped me generate this programme:

 

 

proc sql;
create table Data_komplet_Geo as
select a.*, 
(select count(*) from Data_komplet b where .z<intnx('dtyear',a.eventtime,0)<=b.createdtime<=a.eventtime 
   & 0<=geodist(input(event_northing,32.), input(event_easting,32.), b.latitude, b.longitude)<=0.360) as d0_360,
(select count(*) from Data_komplet b where .z<intnx('dtyear',a.eventtime,0)<=b.createdtime<=a.eventtime
   & 0.360<geodist(input(event_northing,32.), input(event_easting,32.), b.latitude, b.longitude)<=1.08)  as d360_1080,
(select count(*) from Data_komplet b where .z<intnx('dtyear',a.eventtime,0)<=b.createdtime<=a.eventtime
   & 1.08<geodist(input(event_northing,32.), input(event_easting,32.), b.latitude, b.longitude)<=1.8)  as d1080_1800
from Final_database a;
quit;

 

 

It worked well, but unfortunately I would like some more. You might remember my two datasets "Final_database" and the three datasets I joined to "Data_komplet". They looked like:

 

Final_database:

 

ID     Longitude          Lattitude           Eventtime

1     12.21236548     55.14578965     15FEB201713:15:10

2     12.52319752     55.21460198     19FEB201717:12:10

3     12.41219754     56.41301547     21APR201711:03:11

4     12.14016597     55.49783265     05JUN201808:05:44

5     12.13497326     55.65794325     21DEC201815:59:44

6     12.52319632     55.21446125     26DEC201806:30:12

7     12.41659721     56.41302146     21APR201922:12:42

8     12.14316592     55.49789875     05JUN201902:22:23

9     12.13478134     55.65795469     21DEC2019

 

Data_komplet:

 

Createdtime                            Longitude          Lattitude

01FEB1712:00:00     12.62315456     55.46532659

21FEB1708:00:00     12.65312348     55.64325689

05MAR1709:00:00    12.54693265     55.21302356

12JUN1712:00:00     12.06837264     55.21363221

 

My 'Data_komplet' database have changed a bit to:

 

Var    Createdtime             Longitude          Lattitude           Type                       Open           Close          24/7

1      01FEB1712:00:00     12.62315456     55.46532659     Monday                 08:00:00     22:00:00        N

1      01FEB1712:00:00     12.62315456     55.46532659     Tuesday                08:00:00     22:00:00        N

1      01FEB1712:00:00     12.62315456     55.46532659     Wednesday           08:00:00     22:00:00        N

1      01FEB1712:00:00     12.62315456     55.46532659     Thursday               08:00:00     22:00:00        N

1      01FEB1712:00:00     12.62315456     55.46532659     Friday                    08:00:00     22:00:00        N

2      21FEB1708:00:00     12.65312348     55.64325689     Monday                 12:00:00     22:00:00        N

2      21FEB1708:00:00     12.65312348     55.64325689     Tuesday                 08:00:00     22:00:00       N

2      21FEB1708:00:00     12.65312348     55.64325689     Wednesday           10:00:00     22:00:00        N

2      21FEB1708:00:00     12.65312348     55.64325689     Thursday               08:00:00     22:00:00        N

3      05MAR1709:00:00    12.54693265     55.21302356     .                               .                 .                    Y

4      12JUN1712:00:00     12.06837264     55.21363221     .                               .                 .                    Y

 

So... The objective is the same: to count how many observations from 'Data_komplet' that lies within 0-360, 360-1080 and 1080-1800 meters from the observations in 'Final_database'. But it should only count those where eventtime <= createdtime. Further, it should only count those observations from 'Data_komplet' that are either 24/7 (in the first 'Data_komplet' where you helped me, all observations where 24/7) or those where the day in 'Type' matches the weekday of 'Eventtime' and where the time (from 'eventtime') is within the openinghours (between 'Open' and 'Close'). 

 

Phew. A bit complicated to explain! Hope that it is clear and that you have a suggestion - if it is possible! 

 

All best,

MM

   

 

FreelanceReinh
Jade | Level 19

Hi @Mikkel_madsen,

 


@Mikkel_madsen wrote:

So... The objective is the same: to count how many observations from 'Data_komplet' that lies within 0-360, 360-1080 and 1080-1800 meters from the observations in 'Final_database'. But it should only count those where eventtime <= createdtime. Further, it should only count those observations from 'Data_komplet' that are either 24/7 (in the first 'Data_komplet' where you helped me, all observations where 24/7) or those where the day in 'Type' matches the weekday of 'Eventtime' and where the time (from 'eventtime') is within the openinghours (between 'Open' and 'Close').


So you'll need to modify the WHERE conditions to something like the example below (and analogously for the other two distance ranges):

  .z<a.eventtime<=b.createdtime
& 0.360<geodist(input(event_northing,32.), input(event_easting,32.), b.latitude, b.longitude)<=1.08)
& (b.f247='Y' | put(datepart(a.eventtime),downame.-l)=left(b.type) & b.open<=timepart(a.eventtime)<=b.close)

The second line (check of the distance) is unchanged. The modified first line reflects your new criterion, which does no longer require eventtime and createdtime to be in the same year (i.e., now an event in, say, 2016 would also meet the condition with createdtime values in 2017, 2018, etc.).

 

The newly added third line contains the new criteria regarding "24/7" and opening hours/weekday. However, this part might need to be changed depending on your data structure:

  • Most likely, the real name of the "24/7" flag variable is not f247 (which is what I used as an example), so please change it correspondingly. Note that "24/7" is not a valid SAS variable name. Possibly the name is '24/7'n, but I would not recommend using these (so called) name literals. It's not even clear from your post whether this is a character variable or perhaps a formatted numeric variable (like 0='N', 1='Y'), which would require a different WHERE condition.
  • Similar ambiguities exist for variable Type. The code above assumes a character variable Type containing values like those shown in your post.
  • Also the last part of the condition involves an assumption I had to make: that variables Open and Close are numeric and contain SAS time values, i.e., numbers of seconds after midnight. If they were character variables (containing values like '08:00:00'), you would convert them "on the fly" like input(b.open,time8.) instead of b.open etc.

Note further that the incomplete eventtime value shown for ID 9 ("21DEC2019") cannot exist in a SAS datetime variable.

Mikkel_madsen
Obsidian | Level 7

Hi @FreelanceReinh 

 

Thank you for always being so helpful and kind! 

 

I made a mistake with the createdtime and eventtime. It should only (as in the first programme) count eventtimes that took places after createdtime - so I left the first line unchanged (as in the old programme). 

 

I have checked my variables:

1) The variable "247" (which is the correct name of the variable) is a yes/no character variable (in the format "$CHAR3.). 

2) The "Type" variable is also a character variable (in the format $CHAR9.). The values in the "Type" variable is the seven different weekdays (e.g. Monday, tuesday etc.). 

3) The variables "Open" and "Close" are both numeric variables with the format TIME8. and the informat TIME11. 

 

Does it require any adjustments? 

 

Thanks again! Have a nice day 🙂

 

All best MM

FreelanceReinh
Jade | Level 19

Hi @Mikkel_madsen,

 

You're welcome.


@Mikkel_madsen wrote:

I have checked my variables:

1) The variable "247" (which is the correct name of the variable) is a yes/no character variable (in the format "$CHAR3.). 

2) The "Type" variable is also a character variable (in the format $CHAR9.). The values in the "Type" variable is the seven different weekdays (e.g. Monday, tuesday etc.). 

3) The variables "Open" and "Close" are both numeric variables with the format TIME8. and the informat TIME11. 

 

Does it require any adjustments? 


1) Only a name literal such as "247"n can start with a digit (but it's displayed without the quotes and the n). If this is the name of the variable and it has length 3 (see format length) because it contains values 'Yes' and 'No' (rather than 'Y' and 'N'), then, of course, the pertinent condition needs to be modified: Instead of

b.f247='Y'

you could use

upcase(b.'247'n) eqt 'Y'

This would work with values 'Yes', 'yes', 'YES''Y' and 'y' (left aligned in case of single characters; otherwise add the LEFT function), even with 'yep' etc.

 

2) The weekday criterion I proposed is case sensitive, i.e., 'Tuesday' would match, but not 'tuesday' or 'TUESDAY'. To be on the safe side, you may want to add the UPCASE function to both sides of the condition:

upcase(put(datepart(a.eventtime),downame.-l))=upcase(left(b.type))

3) The time variable metadata look good.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 21 replies
  • 1304 views
  • 8 likes
  • 2 in conversation