Hi folks, I'm trying to identify the time range from a database and I don't know how to do it at SAS:
This is how my database is:
state - city - crop - product - week - date_start - date_end st1 - ct1 - cr1 - pdct1 - 1 - 1/1 - 10/1 st1 - ct1 - cr1 - pdct1 - 2 - 11/1 - 20/1 st1 - ct1 - cr1 - pdct1 - 3 - 21/1 - 31/1 st1 - ct1 - cr1 - pdct1 - 34 - 1/12 - 10/12 st1 - ct1 - cr1 - pdct1 - 35 - 11/12 - 20/12 st1 - ct1 - cr1 - pdct1 - 36 - 21/12 - 31/12 st1 - ct1 - cr1 - pdct2 - 1 - 1/1 - 10/1 st1 - ct1 - cr1 - pdct2 - 2 - 11/1 - 20/1 st1 - ct1 - cr1 - pdct2 - 35 - 11/12 - 20/12 st1 - ct1 - cr1 - pdct2 - 36 - 21/12 - 31/12 st1 - ct1 - cr1 - pdct3 - 5 - 11/1 - 20/2 st1 - ct1 - cr1 - pdct3 - 6 - 21/1 - 30/2 st1 - ct1 - cr1 - pdct3 - 7 - 1/1 - 10/3 st1 - ct1 - cr1 - pdct3 - 8 - 11/1 - 20/3
I need identify the date that a crop planting starts and when it ends. The issue that some crops plantation starts in one year and ends at the next year. So, my database output should be like this, but didn't figure how to do this on SAS:
state - city - crop - product - date_start - date_end st1 - ct1 - cr1 - pdct1 - 1/12 - 31/1 st1 - ct1 - cr1 - pdct2 - 11/12 - 20/1 st1 - ct1 - cr1 - pdct3 - 11/1 - 20/3
Ps. Sorry for my bad english.
OK, I think I understand.
I have written the following code. Please note the comments for each step.
** Read in the raw data **;
DATA Crop_Data;
INFILE DATALINES4
DSD DLM='-'
FIRSTOBS=2
;
FORMAT state $3.
city $3.
crop $3.
product $5.
Week $2.
Date_Start $5.
Date_End $5.
;
INPUT State : $3.
City : $3.
Crop : $3.
Product : $5.
Week : $2.
Date_Start : $5.
Date_End : $5.
;
DATALINES4;
state - city - crop - product - Week - date_start - date_end
st1 - ct1 - cr1 - pdct1 - 1 - 1/1 - 10/1
st1 - ct1 - cr1 - pdct1 - 2 - 11/1 - 20/1
st1 - ct1 - cr1 - pdct1 - 3 - 21/1 - 31/1
st1 - ct1 - cr1 - pdct1 - 34 - 1/12 - 10/12
st1 - ct1 - cr1 - pdct1 - 35 - 11/12 - 20/12
st1 - ct1 - cr1 - pdct1 - 36 - 21/12 - 31/12
st1 - ct1 - cr1 - pdct2 - 1 - 1/1 - 10/1
st1 - ct1 - cr1 - pdct2 - 2 - 11/1 - 20/1
st1 - ct1 - cr1 - pdct2 - 35 - 11/12 - 20/12
st1 - ct1 - cr1 - pdct2 - 36 - 21/12 - 31/12
st1 - ct1 - cr1 - pdct3 - 5 - 11/1 - 20/2
st1 - ct1 - cr1 - pdct3 - 6 - 21/1 - 28/2
st1 - ct1 - cr1 - pdct3 - 7 - 1/1 - 10/3
st1 - ct1 - cr1 - pdct3 - 8 - 11/1 - 20/3
;;;;
RUN;
** Sort by State, City, Crop, Product, and Week **;
PROC SORT DATA=Crop_Data;
BY State City Crop Product Week;
RUN;
** If there is a break in sequence, write records after the break to a second dataset **;
DATA Crop_Data1 Crop_Data2;
FORMAT state $3.
city $3.
crop $3.
product $5.
Week $2.
Date_Start $5.
Date_End $5.
;
RETAIN Prior_Week 0;
RETAIN Seq_Break 0;
SET Crop_Data;
BY State City Crop Product;
IF FIRST.Product THEN
DO;
Seq_Break = 0;
Prior_Week = Week;
OUTPUT Crop_Data1;
DELETE;
END;
IF Seq_Break THEN
DO;
Prior_Week = Week;
OUTPUT Crop_Data2;
DELETE;
END;
IF Week = Prior_Week + 1 THEN
DO;
Prior_Week = Week;
OUTPUT Crop_Data1;
DELETE;
END;
ELSE
DO;
Seq_Break = 1;
Prior_Week = Week;
OUTPUT Crop_Data2;
DELETE;
END;
RUN;
** Recombine the data with records from after the sequence break *first*. **;
DATA Resequenced_Crop_Data;
SET Crop_Data2 Crop_Data1;
RUN;
** Sort by State City Crop Product but NOT Week. **;
PROC SORT DATA=Resequenced_Crop_Data EQUALS
OUT=Resequenced_Crop_Data_Sort;
BY State City Crop Product;
RUN;
** Now that weeks are in the proper sequence, determine the date range. **;
DATA Crop_Data_Summarized(RENAME=(Save_Date_Start=Date_Start));
FORMAT state $3.
city $3.
crop $3.
product $5.
Week $2.
Save_Date_Start $5.
Date_Start $5.
Date_End $5.
;
KEEP State City Crop Product Save_Date_Start Date_End;
SET Resequenced_Crop_Data_Sort;
BY State City Crop Product;
RETAIN Save_Date_Start;
IF FIRST.Product THEN
Save_Date_Start = Date_Start;
IF LAST.Product THEN
OUTPUT;
ELSE
DELETE;
RUN;
Which produces:
How does that look?
Jim
Are your dates truly just character values like
1/12
or are they a DATE or DATETIME value?
Jim
Would it be possible for you to provide me with real dates, either in SAS date format or in a character format that SAS can read like 2020-09-28. Take a look at the code below. Can you change it to give me real dates to work with?
DATA Crop_Data;
INFILE DATALINES4
DSD DLM='-'
FIRSTOBS=2
;
INPUT State : $3.
City : $3.
Crop : $3.
Product : $5.
Week : $2.
Date_Start : $5.
Date_End : $5.
;
DATALINES4;
state - city - crop - product - week - date_start - date_end
st1 - ct1 - cr1 - pdct1 - 1 - 1/1 - 10/1
st1 - ct1 - cr1 - pdct1 - 2 - 11/1 - 20/1
st1 - ct1 - cr1 - pdct1 - 3 - 21/1 - 31/1
st1 - ct1 - cr1 - pdct1 - 34 - 1/12 - 10/12
st1 - ct1 - cr1 - pdct1 - 35 - 11/12 - 20/12
st1 - ct1 - cr1 - pdct1 - 36 - 21/12 - 31/12
st1 - ct1 - cr1 - pdct2 - 1 - 1/1 - 10/1
st1 - ct1 - cr1 - pdct2 - 2 - 11/1 - 20/1
st1 - ct1 - cr1 - pdct2 - 35 - 11/12 - 20/12
st1 - ct1 - cr1 - pdct2 - 36 - 21/12 - 31/12
st1 - ct1 - cr1 - pdct3 - 5 - 11/1 - 20/2
st1 - ct1 - cr1 - pdct3 - 6 - 21/1 - 30/2
st1 - ct1 - cr1 - pdct3 - 7 - 1/1 - 10/3
st1 - ct1 - cr1 - pdct3 - 8 - 11/1 - 20/3
;;;;
RUN;
The above code gives me the below results, but I need real dates in order to try to solve the problem.
Jim
Why is date_start not 1/1 for pdct3?
Because this crop plantation start in 11/Jan and ends in 20/Mar.
So the crop plantation is not the first date_start? What's the rule for seeing that in the data?
The crop plantation definition order is the week sequence.
Here is one approach. I modified my code to put in the year. I chose 2020, but you should put in real years of course. Then, below that, I added some SQL. The SQL gives us the earliest date_start and the last date_end. Is this what you need? I show the results at the very bottom.
DATA Crop_Data;
INFILE DATALINES4
DSD DLM='-'
FIRSTOBS=2
;
FORMAT state $3.
city $3.
crop $3.
product $5.
weeK $2.
Date_Start DATE9.
Date_End DATE9.
;
INPUT State : $3.
City : $3.
Crop : $3.
Product : $5.
Week : $2.
Date_Start : ddmmyy10.
Date_End : ddmmyy10.
;
DATALINES4;
state - city - crop - product - week - date_start - date_end
st1 - ct1 - cr1 - pdct1 - 1 - 1/1/2020 - 10/1/2020
st1 - ct1 - cr1 - pdct1 - 2 - 11/1/2020 - 20/1/2020
st1 - ct1 - cr1 - pdct1 - 3 - 21/1/2020 - 31/1/2020
st1 - ct1 - cr1 - pdct1 - 34 - 1/12/2020 - 10/12/2020
st1 - ct1 - cr1 - pdct1 - 35 - 11/12/2020 - 20/12/2020
st1 - ct1 - cr1 - pdct1 - 36 - 21/12/2020 - 31/12/2020
st1 - ct1 - cr1 - pdct2 - 1 - 1/1/2020 - 10/1/2020
st1 - ct1 - cr1 - pdct2 - 2 - 11/1/2020 - 20/1/2020
st1 - ct1 - cr1 - pdct2 - 35 - 11/12/2020 - 20/12/2020
st1 - ct1 - cr1 - pdct2 - 36 - 21/12/2020 - 31/12/2020
st1 - ct1 - cr1 - pdct3 - 5 - 11/1/2020 - 20/2/2020
st1 - ct1 - cr1 - pdct3 - 6 - 21/1/2020 - 28/2/2020
st1 - ct1 - cr1 - pdct3 - 7 - 1/1/2020 - 10/3/2020
st1 - ct1 - cr1 - pdct3 - 8 - 11/1/2020 - 20/3/2020
;;;;
RUN;
PROC SQL;
CREATE TABLE CROP_DATA_RANGES AS
SELECT State
,City
,Crop
,Product
,MIN(Date_Start) AS Date_Start FORMAT YYMMDDS10.
,MAX(Date_End) AS Date_End FORMAT YYMMDDS10.
FROM Crop_Data
GROUP BY State
,City
,Crop
,Product
;
QUIT;
Results:
Jim
Was that last bit of code helpful? I think it is what you need (after you add the real years).
Jim
no, because we can't use a year value, only date and month. the order is defined by week sequence. If the sequence start at week 1 but it's interrupted, the correct sequence starts in the next order.
After the week sequence is solved, whe use the date_start from the first row and date_end from the last row.
Product 1 input:
state - city - crop - product - week - date_start - date_end st1 - ct1 - cr1 - pdct1 - 1 - 1/1 - 10/1 st1 - ct1 - cr1 - pdct1 - 2 - 11/1 - 20/1 st1 - ct1 - cr1 - pdct1 - 3 - 21/1 - 31/1 st1 - ct1 - cr1 - pdct1 - 34 - 1/12 - 10/12 st1 - ct1 - cr1 - pdct1 - 35 - 11/12 - 20/12 st1 - ct1 - cr1 - pdct1 - 36 - 21/12 - 31/12
Product 1 correct week sequence:
state - city - crop - product - week - date_start - date_end st1 - ct1 - cr1 - pdct1 - 34 - 1/12 - 10/12 st1 - ct1 - cr1 - pdct1 - 35 - 11/12 - 20/12 st1 - ct1 - cr1 - pdct1 - 36 - 21/12 - 31/12 st1 - ct1 - cr1 - pdct1 - 1 - 1/1 - 10/1 st1 - ct1 - cr1 - pdct1 - 2 - 11/1 - 20/1 st1 - ct1 - cr1 - pdct1 - 3 - 21/1 - 31/1
Product 1 correct output:
state - city - crop - product - date_start - date_end st1 - ct1 - cr1 - pdct1 - 1/12 - 31/1
Great! But please explain the sequence for Product 3. That one eludes me.
If the full date is present in the database, I would highly recommend that the full date, including year, be pulled from the database. The date can be formatted any way you like, but the date should be stored as a full numeric SAS date.
Let me see what I can do, but I strongly encourage using the full date from the database.
Jim
As pdct 3 only has one week sequence, as week 5, 6, 7 and 8, whe use date_start from first row and date_end from last row.
st1 - ct1 - cr1 - pdct3 - 5 - 11/1 - 20/2 st1 - ct1 - cr1 - pdct3 - 6 - 21/1 - 30/2 st1 - ct1 - cr1 - pdct3 - 7 - 1/1 - 10/3 st1 - ct1 - cr1 - pdct3 - 8 - 11/1 - 20/3
OK, I think I understand.
I have written the following code. Please note the comments for each step.
** Read in the raw data **;
DATA Crop_Data;
INFILE DATALINES4
DSD DLM='-'
FIRSTOBS=2
;
FORMAT state $3.
city $3.
crop $3.
product $5.
Week $2.
Date_Start $5.
Date_End $5.
;
INPUT State : $3.
City : $3.
Crop : $3.
Product : $5.
Week : $2.
Date_Start : $5.
Date_End : $5.
;
DATALINES4;
state - city - crop - product - Week - date_start - date_end
st1 - ct1 - cr1 - pdct1 - 1 - 1/1 - 10/1
st1 - ct1 - cr1 - pdct1 - 2 - 11/1 - 20/1
st1 - ct1 - cr1 - pdct1 - 3 - 21/1 - 31/1
st1 - ct1 - cr1 - pdct1 - 34 - 1/12 - 10/12
st1 - ct1 - cr1 - pdct1 - 35 - 11/12 - 20/12
st1 - ct1 - cr1 - pdct1 - 36 - 21/12 - 31/12
st1 - ct1 - cr1 - pdct2 - 1 - 1/1 - 10/1
st1 - ct1 - cr1 - pdct2 - 2 - 11/1 - 20/1
st1 - ct1 - cr1 - pdct2 - 35 - 11/12 - 20/12
st1 - ct1 - cr1 - pdct2 - 36 - 21/12 - 31/12
st1 - ct1 - cr1 - pdct3 - 5 - 11/1 - 20/2
st1 - ct1 - cr1 - pdct3 - 6 - 21/1 - 28/2
st1 - ct1 - cr1 - pdct3 - 7 - 1/1 - 10/3
st1 - ct1 - cr1 - pdct3 - 8 - 11/1 - 20/3
;;;;
RUN;
** Sort by State, City, Crop, Product, and Week **;
PROC SORT DATA=Crop_Data;
BY State City Crop Product Week;
RUN;
** If there is a break in sequence, write records after the break to a second dataset **;
DATA Crop_Data1 Crop_Data2;
FORMAT state $3.
city $3.
crop $3.
product $5.
Week $2.
Date_Start $5.
Date_End $5.
;
RETAIN Prior_Week 0;
RETAIN Seq_Break 0;
SET Crop_Data;
BY State City Crop Product;
IF FIRST.Product THEN
DO;
Seq_Break = 0;
Prior_Week = Week;
OUTPUT Crop_Data1;
DELETE;
END;
IF Seq_Break THEN
DO;
Prior_Week = Week;
OUTPUT Crop_Data2;
DELETE;
END;
IF Week = Prior_Week + 1 THEN
DO;
Prior_Week = Week;
OUTPUT Crop_Data1;
DELETE;
END;
ELSE
DO;
Seq_Break = 1;
Prior_Week = Week;
OUTPUT Crop_Data2;
DELETE;
END;
RUN;
** Recombine the data with records from after the sequence break *first*. **;
DATA Resequenced_Crop_Data;
SET Crop_Data2 Crop_Data1;
RUN;
** Sort by State City Crop Product but NOT Week. **;
PROC SORT DATA=Resequenced_Crop_Data EQUALS
OUT=Resequenced_Crop_Data_Sort;
BY State City Crop Product;
RUN;
** Now that weeks are in the proper sequence, determine the date range. **;
DATA Crop_Data_Summarized(RENAME=(Save_Date_Start=Date_Start));
FORMAT state $3.
city $3.
crop $3.
product $5.
Week $2.
Save_Date_Start $5.
Date_Start $5.
Date_End $5.
;
KEEP State City Crop Product Save_Date_Start Date_End;
SET Resequenced_Crop_Data_Sort;
BY State City Crop Product;
RETAIN Save_Date_Start;
IF FIRST.Product THEN
Save_Date_Start = Date_Start;
IF LAST.Product THEN
OUTPUT;
ELSE
DELETE;
RUN;
Which produces:
How does that look?
Jim
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.