Hi folks, I'm trying to identify the time range from a database and I don't know how to do it at SAS:
This is how my database is:
state - city - crop - product - week - date_start - date_end st1 - ct1 - cr1 - pdct1 - 1 - 1/1 - 10/1 st1 - ct1 - cr1 - pdct1 - 2 - 11/1 - 20/1 st1 - ct1 - cr1 - pdct1 - 3 - 21/1 - 31/1 st1 - ct1 - cr1 - pdct1 - 34 - 1/12 - 10/12 st1 - ct1 - cr1 - pdct1 - 35 - 11/12 - 20/12 st1 - ct1 - cr1 - pdct1 - 36 - 21/12 - 31/12 st1 - ct1 - cr1 - pdct2 - 1 - 1/1 - 10/1 st1 - ct1 - cr1 - pdct2 - 2 - 11/1 - 20/1 st1 - ct1 - cr1 - pdct2 - 35 - 11/12 - 20/12 st1 - ct1 - cr1 - pdct2 - 36 - 21/12 - 31/12 st1 - ct1 - cr1 - pdct3 - 5 - 11/1 - 20/2 st1 - ct1 - cr1 - pdct3 - 6 - 21/1 - 30/2 st1 - ct1 - cr1 - pdct3 - 7 - 1/1 - 10/3 st1 - ct1 - cr1 - pdct3 - 8 - 11/1 - 20/3
I need identify the date that a crop planting starts and when it ends. The issue that some crops plantation starts in one year and ends at the next year. So, my database output should be like this, but didn't figure how to do this on SAS:
state - city - crop - product - date_start - date_end st1 - ct1 - cr1 - pdct1 - 1/12 - 31/1 st1 - ct1 - cr1 - pdct2 - 11/12 - 20/1 st1 - ct1 - cr1 - pdct3 - 11/1 - 20/3
Ps. Sorry for my bad english.
OK, I think I understand.
I have written the following code. Please note the comments for each step.
**	Read in the raw data	**;
DATA	Crop_Data;
	INFILE	DATALINES4
			DSD	DLM='-'
			FIRSTOBS=2
			;
	FORMAT	state 		$3.
			city 		$3.
			crop 		$3.
			product 	$5.
			Week		$2.
			Date_Start	$5.
			Date_End	$5.
			;
	
	INPUT	State		:	$3.
			City		:	$3.
			Crop		:	$3.
			Product		:	$5.
			Week		:	$2.
			Date_Start	:	$5.
			Date_End	:	$5.
			;
DATALINES4;
state - city - crop - product - Week - date_start - date_end
st1   - ct1  - cr1  -  pdct1  -   1  -     1/1    -    10/1
st1   - ct1  - cr1  -  pdct1  -   2  -    11/1    -    20/1
st1   - ct1  - cr1  -  pdct1  -   3  -    21/1    -    31/1
st1   - ct1  - cr1  -  pdct1  -  34  -    1/12    -   10/12
st1   - ct1  - cr1  -  pdct1  -  35  -   11/12    -   20/12
st1   - ct1  - cr1  -  pdct1  -  36  -   21/12    -   31/12
st1   - ct1  - cr1  -  pdct2  -   1  -    1/1    -    10/1
st1   - ct1  - cr1  -  pdct2  -   2  -    11/1    -    20/1
st1   - ct1  - cr1  -  pdct2  -  35  -   11/12    -   20/12
st1   - ct1  - cr1  -  pdct2  -  36  -   21/12    -   31/12
st1   - ct1  - cr1  -  pdct3  -   5  -   11/1    -    20/2
st1   - ct1  - cr1  -  pdct3  -   6  -    21/1    -    28/2
st1   - ct1  - cr1  -  pdct3  -   7  -     1/1    -    10/3
st1   - ct1  - cr1  -  pdct3  -   8  -    11/1    -    20/3
;;;;
RUN;
**	Sort by State, City, Crop, Product, and Week	**;
PROC	SORT	DATA=Crop_Data;
	BY	State City Crop Product Week;
RUN;
**	If there is a break in sequence, write records after the break to a second dataset	**;
DATA	Crop_Data1	Crop_Data2;
	FORMAT	state 			$3.
			city 			$3.
			crop 			$3.
			product 		$5.
			Week			$2.
			Date_Start		$5.
			Date_End		$5.
			;
	RETAIN	Prior_Week	0;
	RETAIN	Seq_Break	0;
	SET	Crop_Data;
		BY	State City Crop Product;
	IF	FIRST.Product			THEN	
		DO;
			Seq_Break		=	0;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data1;
			DELETE;
		END;
	
	IF	Seq_Break				THEN
		DO;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data2;
			DELETE;
		END;
	IF	Week				=	Prior_Week + 1	THEN
		DO;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data1;
			DELETE;
		END;
	ELSE
		DO;
			Seq_Break		=	1;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data2;
			DELETE;
		END;
RUN;
**	Recombine the data with records from after the sequence break *first*.	**;
DATA	Resequenced_Crop_Data;
	SET	Crop_Data2	Crop_Data1;
RUN;
**	Sort by State City Crop Product but NOT Week.	**;
PROC	SORT	DATA=Resequenced_Crop_Data	EQUALS
				OUT=Resequenced_Crop_Data_Sort;
	BY	State City Crop Product;
RUN;
**	Now that weeks are in the proper sequence, determine the date range.	**;
DATA	Crop_Data_Summarized(RENAME=(Save_Date_Start=Date_Start));
	FORMAT	state 			$3.
			city 			$3.
			crop 			$3.
			product 		$5.
			Week			$2.
			Save_Date_Start	$5.
			Date_Start		$5.
			Date_End		$5.
			;
	KEEP	State City Crop Product Save_Date_Start Date_End;
	SET	Resequenced_Crop_Data_Sort;
		BY	State City Crop Product;
	RETAIN	Save_Date_Start;
	IF	FIRST.Product		THEN
		Save_Date_Start	=	Date_Start;
	IF	LAST.Product	THEN
		OUTPUT;
	ELSE
		DELETE;
RUN;
Which produces:
How does that look?
Jim
Are your dates truly just character values like
1/12
or are they a DATE or DATETIME value?
Jim
Would it be possible for you to provide me with real dates, either in SAS date format or in a character format that SAS can read like 2020-09-28. Take a look at the code below. Can you change it to give me real dates to work with?
DATA	Crop_Data;
	INFILE	DATALINES4
			DSD	DLM='-'
			FIRSTOBS=2
			;
	
	INPUT	State		:	$3.
			City		:	$3.
			Crop		:	$3.
			Product		:	$5.
			Week		:	$2.
			Date_Start	:	$5.
			Date_End	:	$5.
			;
DATALINES4;
state - city - crop - product - week - date_start - date_end
st1   - ct1  - cr1  -  pdct1  -   1  -     1/1    -    10/1    
st1   - ct1  - cr1  -  pdct1  -   2  -    11/1    -    20/1    
st1   - ct1  - cr1  -  pdct1  -   3  -    21/1    -    31/1    
st1   - ct1  - cr1  -  pdct1  -  34  -    1/12    -   10/12    
st1   - ct1  - cr1  -  pdct1  -  35  -   11/12    -   20/12    
st1   - ct1  - cr1  -  pdct1  -  36  -   21/12    -   31/12    
st1   - ct1  - cr1  -  pdct2  -   1  -     1/1    -    10/1    
st1   - ct1  - cr1  -  pdct2  -   2  -    11/1    -    20/1    
st1   - ct1  - cr1  -  pdct2  -  35  -   11/12    -   20/12    
st1   - ct1  - cr1  -  pdct2  -  36  -   21/12    -   31/12    
st1   - ct1  - cr1  -  pdct3  -   5  -    11/1    -    20/2    
st1   - ct1  - cr1  -  pdct3  -   6  -    21/1    -    30/2    
st1   - ct1  - cr1  -  pdct3  -   7  -     1/1    -    10/3    
st1   - ct1  - cr1  -  pdct3  -   8  -    11/1    -    20/3  
;;;;
RUN;
The above code gives me the below results, but I need real dates in order to try to solve the problem.
Jim
Why is date_start not 1/1 for pdct3?
Because this crop plantation start in 11/Jan and ends in 20/Mar.
So the crop plantation is not the first date_start? What's the rule for seeing that in the data?
The crop plantation definition order is the week sequence.
Here is one approach. I modified my code to put in the year. I chose 2020, but you should put in real years of course. Then, below that, I added some SQL. The SQL gives us the earliest date_start and the last date_end. Is this what you need? I show the results at the very bottom.
DATA	Crop_Data;
	INFILE	DATALINES4
			DSD	DLM='-'
			FIRSTOBS=2
			;
	FORMAT	state 		$3.
			city 		$3.
			crop 		$3.
			product 	$5.
			weeK		$2.
			Date_Start	DATE9.
			Date_End	DATE9.
			;
	
	INPUT	State		:	$3.
			City		:	$3.
			Crop		:	$3.
			Product		:	$5.
			Week		:	$2.
			Date_Start	:	ddmmyy10.
			Date_End	:	ddmmyy10.
			;
DATALINES4;
state - city - crop - product - week - date_start - date_end
st1   - ct1  - cr1  -  pdct1  -   1  -     1/1/2020    -    10/1/2020
st1   - ct1  - cr1  -  pdct1  -   2  -    11/1/2020    -    20/1/2020
st1   - ct1  - cr1  -  pdct1  -   3  -    21/1/2020    -    31/1/2020
st1   - ct1  - cr1  -  pdct1  -  34  -    1/12/2020    -   10/12/2020
st1   - ct1  - cr1  -  pdct1  -  35  -   11/12/2020    -   20/12/2020
st1   - ct1  - cr1  -  pdct1  -  36  -   21/12/2020    -   31/12/2020
st1   - ct1  - cr1  -  pdct2  -   1  -    1/1/2020    -    10/1/2020
st1   - ct1  - cr1  -  pdct2  -   2  -    11/1/2020    -    20/1/2020
st1   - ct1  - cr1  -  pdct2  -  35  -   11/12/2020    -   20/12/2020
st1   - ct1  - cr1  -  pdct2  -  36  -   21/12/2020    -   31/12/2020
st1   - ct1  - cr1  -  pdct3  -   5  -   11/1/2020    -    20/2/2020
st1   - ct1  - cr1  -  pdct3  -   6  -    21/1/2020    -    28/2/2020
st1   - ct1  - cr1  -  pdct3  -   7  -     1/1/2020    -    10/3/2020
st1   - ct1  - cr1  -  pdct3  -   8  -    11/1/2020    -    20/3/2020
;;;;
RUN;
PROC	SQL;
	CREATE	TABLE	CROP_DATA_RANGES	AS
		SELECT	State
				,City
				,Crop
				,Product
				,MIN(Date_Start)		AS	Date_Start	FORMAT	YYMMDDS10.
				,MAX(Date_End)			AS	Date_End	FORMAT	YYMMDDS10.
			FROM	Crop_Data
			GROUP	BY	State
				,City
				,Crop
				,Product
			;
QUIT;
Results:
Jim
Was that last bit of code helpful? I think it is what you need (after you add the real years).
Jim
no, because we can't use a year value, only date and month. the order is defined by week sequence. If the sequence start at week 1 but it's interrupted, the correct sequence starts in the next order.
After the week sequence is solved, whe use the date_start from the first row and date_end from the last row.
Product 1 input:
state - city - crop - product - week - date_start - date_end st1 - ct1 - cr1 - pdct1 - 1 - 1/1 - 10/1 st1 - ct1 - cr1 - pdct1 - 2 - 11/1 - 20/1 st1 - ct1 - cr1 - pdct1 - 3 - 21/1 - 31/1 st1 - ct1 - cr1 - pdct1 - 34 - 1/12 - 10/12 st1 - ct1 - cr1 - pdct1 - 35 - 11/12 - 20/12 st1 - ct1 - cr1 - pdct1 - 36 - 21/12 - 31/12
Product 1 correct week sequence:
state - city - crop - product - week - date_start - date_end st1 - ct1 - cr1 - pdct1 - 34 - 1/12 - 10/12 st1 - ct1 - cr1 - pdct1 - 35 - 11/12 - 20/12 st1 - ct1 - cr1 - pdct1 - 36 - 21/12 - 31/12 st1 - ct1 - cr1 - pdct1 - 1 - 1/1 - 10/1 st1 - ct1 - cr1 - pdct1 - 2 - 11/1 - 20/1 st1 - ct1 - cr1 - pdct1 - 3 - 21/1 - 31/1
Product 1 correct output:
state - city - crop - product - date_start - date_end st1 - ct1 - cr1 - pdct1 - 1/12 - 31/1
Great! But please explain the sequence for Product 3. That one eludes me.
If the full date is present in the database, I would highly recommend that the full date, including year, be pulled from the database. The date can be formatted any way you like, but the date should be stored as a full numeric SAS date.
Let me see what I can do, but I strongly encourage using the full date from the database.
Jim
As pdct 3 only has one week sequence, as week 5, 6, 7 and 8, whe use date_start from first row and date_end from last row.
st1 - ct1 - cr1 - pdct3 - 5 - 11/1 - 20/2 st1 - ct1 - cr1 - pdct3 - 6 - 21/1 - 30/2 st1 - ct1 - cr1 - pdct3 - 7 - 1/1 - 10/3 st1 - ct1 - cr1 - pdct3 - 8 - 11/1 - 20/3
OK, I think I understand.
I have written the following code. Please note the comments for each step.
**	Read in the raw data	**;
DATA	Crop_Data;
	INFILE	DATALINES4
			DSD	DLM='-'
			FIRSTOBS=2
			;
	FORMAT	state 		$3.
			city 		$3.
			crop 		$3.
			product 	$5.
			Week		$2.
			Date_Start	$5.
			Date_End	$5.
			;
	
	INPUT	State		:	$3.
			City		:	$3.
			Crop		:	$3.
			Product		:	$5.
			Week		:	$2.
			Date_Start	:	$5.
			Date_End	:	$5.
			;
DATALINES4;
state - city - crop - product - Week - date_start - date_end
st1   - ct1  - cr1  -  pdct1  -   1  -     1/1    -    10/1
st1   - ct1  - cr1  -  pdct1  -   2  -    11/1    -    20/1
st1   - ct1  - cr1  -  pdct1  -   3  -    21/1    -    31/1
st1   - ct1  - cr1  -  pdct1  -  34  -    1/12    -   10/12
st1   - ct1  - cr1  -  pdct1  -  35  -   11/12    -   20/12
st1   - ct1  - cr1  -  pdct1  -  36  -   21/12    -   31/12
st1   - ct1  - cr1  -  pdct2  -   1  -    1/1    -    10/1
st1   - ct1  - cr1  -  pdct2  -   2  -    11/1    -    20/1
st1   - ct1  - cr1  -  pdct2  -  35  -   11/12    -   20/12
st1   - ct1  - cr1  -  pdct2  -  36  -   21/12    -   31/12
st1   - ct1  - cr1  -  pdct3  -   5  -   11/1    -    20/2
st1   - ct1  - cr1  -  pdct3  -   6  -    21/1    -    28/2
st1   - ct1  - cr1  -  pdct3  -   7  -     1/1    -    10/3
st1   - ct1  - cr1  -  pdct3  -   8  -    11/1    -    20/3
;;;;
RUN;
**	Sort by State, City, Crop, Product, and Week	**;
PROC	SORT	DATA=Crop_Data;
	BY	State City Crop Product Week;
RUN;
**	If there is a break in sequence, write records after the break to a second dataset	**;
DATA	Crop_Data1	Crop_Data2;
	FORMAT	state 			$3.
			city 			$3.
			crop 			$3.
			product 		$5.
			Week			$2.
			Date_Start		$5.
			Date_End		$5.
			;
	RETAIN	Prior_Week	0;
	RETAIN	Seq_Break	0;
	SET	Crop_Data;
		BY	State City Crop Product;
	IF	FIRST.Product			THEN	
		DO;
			Seq_Break		=	0;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data1;
			DELETE;
		END;
	
	IF	Seq_Break				THEN
		DO;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data2;
			DELETE;
		END;
	IF	Week				=	Prior_Week + 1	THEN
		DO;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data1;
			DELETE;
		END;
	ELSE
		DO;
			Seq_Break		=	1;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data2;
			DELETE;
		END;
RUN;
**	Recombine the data with records from after the sequence break *first*.	**;
DATA	Resequenced_Crop_Data;
	SET	Crop_Data2	Crop_Data1;
RUN;
**	Sort by State City Crop Product but NOT Week.	**;
PROC	SORT	DATA=Resequenced_Crop_Data	EQUALS
				OUT=Resequenced_Crop_Data_Sort;
	BY	State City Crop Product;
RUN;
**	Now that weeks are in the proper sequence, determine the date range.	**;
DATA	Crop_Data_Summarized(RENAME=(Save_Date_Start=Date_Start));
	FORMAT	state 			$3.
			city 			$3.
			crop 			$3.
			product 		$5.
			Week			$2.
			Save_Date_Start	$5.
			Date_Start		$5.
			Date_End		$5.
			;
	KEEP	State City Crop Product Save_Date_Start Date_End;
	SET	Resequenced_Crop_Data_Sort;
		BY	State City Crop Product;
	RETAIN	Save_Date_Start;
	IF	FIRST.Product		THEN
		Save_Date_Start	=	Date_Start;
	IF	LAST.Product	THEN
		OUTPUT;
	ELSE
		DELETE;
RUN;
Which produces:
How does that look?
Jim
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.