Hi, @Panagiotis :), I saw the challenge today, and I hurried to solve it. 🙂 I am thankful for it as it made me find and learn two new SAS functions. 🙂 I didn't find the task difficult. However, I had to think a little bit when creating the DATE_TIME and the EQ_PRIMARY variables. I wanted to find a way to compute each of them with a single line of code. I found two SAS functions on Internet very quickly to help me with this task: 1) In another SAS community post named Combining Date and Time into one variable, @Reeza demonstrated how we could use the dhms SAS function to combine date, hour, minute and seconds values in one DateTime variable. 🙂 2) for the EQ_PRIMARY variable, I found this excellent 2017 SAS Global Forum paper Fifteen Functions to Supercharge Your SAS® Code by Joshua M. Horstman. He explains concisely and clearly how someone can use COALESCE and COALESCEC functions to find the first non-missing argument(variable's value) from a given number of numeric or character arguments, respectively. An Excelelnt paper I would recommend to the SAS programming community. 🙂 So, after I quickly found these functions, it wasn't a problem for me to solve the challenge. 🙂 Here is my code: /*Access the SAS Data sets*/
libname quakes "path-to-earthquakes-folder";
/*Step 1: Clean the data*/
data earthquakes_clean;
set quakes.earthquakes_dirty;
/*Enter Code Here*/
ID=scan(ID_REGIONCODE,1,'-');
REGION_CODE=scan(ID_REGIONCODE,-1,'-');
FLAG_TSUNAMI=upcase(FLAG_TSUNAMI);
DATE_TIME=dhms(mdy(month,day,year),hour,minute,seconds);
EQ_PRIMARY=coalesce(of eq:);
format Date_Time datetime. EQ_PRIMARY 4.1;
/*Keep only the following variables*/
keep ID Region_Code Flag_Tsunami Date_Time
EQ_Primary Focal_Depth Country Location_Name;
run;
data earthquakes_clean;
retain ID REGION_CODE FLAG_TSUNAMI DATE_TIME EQ_PRIMARY FOCAL_DEPTH COUNTRY LOCATION_NAME;
set earthquakes_clean;
run;
proc sort data=earthquakes_clean out=earthquakes_clean;
by ID;
run;
/*Step 2: Create a valid and invalid data set*/
data earthquakes_valid(drop=INVALID_DESCRIPTION) invalid(drop=Invalid_ID Invalid_Region_Code Invalid_FLAG_TSUNAMI Invalid_DATE_TIME Invalid_EQ_PRIMARY Invalid_FOCAL_DEPTH);
set earthquakes_clean;
/*Enter Code Here*/
length INVALID_DESCRIPTION $33;
by ID;
if first.ID and last.ID and
REGION_CODE in(1, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90,100, 110, 120, 130, 140, 150, 160, 170) and
FLAG_TSUNAMI in (' ' 'TSU') and
DATE_TIME ne . and
0.0 <= EQ_PRIMARY <= 9.9 and
0 <= FOCAL_DEPTH <= 700 then do;
output earthquakes_valid;
end;
else do;
if first.ID=0 or last.ID=0 then Invalid_ID='ID';
if REGION_CODE not in(1, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90,100, 110, 120, 130, 140, 150, 160, 170) then Invalid_Region_Code='Region Code';
if FLAG_TSUNAMI not in (' ' 'TSU') then Invalid_FLAG_TSUNAMI='Flag Tsunami';
if DATE_TIME eq . then Invalid_DATE_TIME='Date Time';
if EQ_PRIMARY < 0.0 or EQ_PRIMARY > 9.9 then Invalid_EQ_PRIMARY='EQ Primary';
if FOCAL_DEPTH < 0 or FOCAL_DEPTH > 700 then Invalid_FOCAL_DEPTH='Focal Depth';
INVALID_DESCRIPTION=catx(',',Invalid_ID,Invalid_Region_Code,Invalid_FLAG_TSUNAMI,Invalid_DATE_TIME,Invalid_EQ_PRIMARY,Invalid_FOCAL_DEPTH);
output invalid;
end;
run;
/****************************************************************************************************
****************************END OF CHALLENGE, PLEASE READ THE COMMENTS BELOW*************************
****************************************************************************************************/
/************************************************************************
Validate your Results
*************************************************************************
- Please run the following procedures when you are complete to answer the
validation questions in Section 1.3 of your document. After you run the code below,
check the results viewer for the answers.
*************************************************************************/
ods noproctitle;
title "Question 1";
title2 "What is the average magnitude for the EQ_PRIMARY variable in the Earthquakes_valid data set?";
proc means data=earthquakes_valid mean maxdec=2;
var eq_primary;
run;
title "Question 2";
title2 "How many earthquakes have a missing value for DATE_TIME in the Invalid data set?";
proc freq data=invalid;
tables date_time / nocum nopercent missing;
where date_time = .;
run;
title "Question 3";
title2 "How many observations are in the Invalid data set?";
proc sql;
select count(*) as Total_Invalid_Obs
from invalid;
quit;
title;
ods proctitle;
... View more