BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
carloshmcaldas
Fluorite | Level 6

Hi folks, I'm trying to identify the time range from a database and I don't know how to do it at SAS:

 

This is how my database is:

state - city - crop - product - week - date_start - date_end
st1   - ct1  - cr1  -  pdct1  -   1  -     1/1    -    10/1    
st1   - ct1  - cr1  -  pdct1  -   2  -    11/1    -    20/1    
st1   - ct1  - cr1  -  pdct1  -   3  -    21/1    -    31/1    
st1   - ct1  - cr1  -  pdct1  -  34  -    1/12    -   10/12    
st1   - ct1  - cr1  -  pdct1  -  35  -   11/12    -   20/12    
st1   - ct1  - cr1  -  pdct1  -  36  -   21/12    -   31/12    
st1   - ct1  - cr1  -  pdct2  -   1  -     1/1    -    10/1    
st1   - ct1  - cr1  -  pdct2  -   2  -    11/1    -    20/1    
st1   - ct1  - cr1  -  pdct2  -  35  -   11/12    -   20/12    
st1   - ct1  - cr1  -  pdct2  -  36  -   21/12    -   31/12    
st1   - ct1  - cr1  -  pdct3  -   5  -    11/1    -    20/2    
st1   - ct1  - cr1  -  pdct3  -   6  -    21/1    -    30/2    
st1   - ct1  - cr1  -  pdct3  -   7  -     1/1    -    10/3    
st1   - ct1  - cr1  -  pdct3  -   8  -    11/1    -    20/3    

I need identify the date that a crop planting starts and when it ends. The issue that some crops plantation starts in one year and ends at the next year. So, my database output should be like this, but didn't figure how to do this on SAS:

 

state - city - crop - product - date_start -  date_end 
st1   - ct1  - cr1  -  pdct1  -    1/12    -    31/1    
st1   - ct1  - cr1  -  pdct2  -   11/12    -    20/1        
st1   - ct1  - cr1  -  pdct3  -    11/1    -    20/3    

 

Ps. Sorry for my bad english.

1 ACCEPTED SOLUTION

Accepted Solutions
jimbarbour
Meteorite | Level 14

OK, I think I understand.

 

I have written the following code.  Please note the comments for each step.

**	Read in the raw data	**;
DATA	Crop_Data;
	INFILE	DATALINES4
			DSD	DLM='-'
			FIRSTOBS=2
			;

	FORMAT	state 		$3.
			city 		$3.
			crop 		$3.
			product 	$5.
			Week		$2.
			Date_Start	$5.
			Date_End	$5.
			;
	
	INPUT	State		:	$3.
			City		:	$3.
			Crop		:	$3.
			Product		:	$5.
			Week		:	$2.
			Date_Start	:	$5.
			Date_End	:	$5.
			;

DATALINES4;
state - city - crop - product - Week - date_start - date_end
st1   - ct1  - cr1  -  pdct1  -   1  -     1/1    -    10/1
st1   - ct1  - cr1  -  pdct1  -   2  -    11/1    -    20/1
st1   - ct1  - cr1  -  pdct1  -   3  -    21/1    -    31/1
st1   - ct1  - cr1  -  pdct1  -  34  -    1/12    -   10/12
st1   - ct1  - cr1  -  pdct1  -  35  -   11/12    -   20/12
st1   - ct1  - cr1  -  pdct1  -  36  -   21/12    -   31/12
st1   - ct1  - cr1  -  pdct2  -   1  -    1/1    -    10/1
st1   - ct1  - cr1  -  pdct2  -   2  -    11/1    -    20/1
st1   - ct1  - cr1  -  pdct2  -  35  -   11/12    -   20/12
st1   - ct1  - cr1  -  pdct2  -  36  -   21/12    -   31/12
st1   - ct1  - cr1  -  pdct3  -   5  -   11/1    -    20/2
st1   - ct1  - cr1  -  pdct3  -   6  -    21/1    -    28/2
st1   - ct1  - cr1  -  pdct3  -   7  -     1/1    -    10/3
st1   - ct1  - cr1  -  pdct3  -   8  -    11/1    -    20/3
;;;;
RUN;

**	Sort by State, City, Crop, Product, and Week	**;
PROC	SORT	DATA=Crop_Data;
	BY	State City Crop Product Week;
RUN;

**	If there is a break in sequence, write records after the break to a second dataset	**;
DATA	Crop_Data1	Crop_Data2;
	FORMAT	state 			$3.
			city 			$3.
			crop 			$3.
			product 		$5.
			Week			$2.
			Date_Start		$5.
			Date_End		$5.
			;

	RETAIN	Prior_Week	0;
	RETAIN	Seq_Break	0;

	SET	Crop_Data;
		BY	State City Crop Product;

	IF	FIRST.Product			THEN	
		DO;
			Seq_Break		=	0;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data1;
			DELETE;
		END;
	
	IF	Seq_Break				THEN
		DO;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data2;
			DELETE;
		END;

	IF	Week				=	Prior_Week + 1	THEN
		DO;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data1;
			DELETE;
		END;
	ELSE
		DO;
			Seq_Break		=	1;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data2;
			DELETE;
		END;
RUN;

**	Recombine the data with records from after the sequence break *first*.	**;
DATA	Resequenced_Crop_Data;
	SET	Crop_Data2	Crop_Data1;
RUN;


**	Sort by State City Crop Product but NOT Week.	**;
PROC	SORT	DATA=Resequenced_Crop_Data	EQUALS
				OUT=Resequenced_Crop_Data_Sort;
	BY	State City Crop Product;
RUN;

**	Now that weeks are in the proper sequence, determine the date range.	**;
DATA	Crop_Data_Summarized(RENAME=(Save_Date_Start=Date_Start));
	FORMAT	state 			$3.
			city 			$3.
			crop 			$3.
			product 		$5.
			Week			$2.
			Save_Date_Start	$5.
			Date_Start		$5.
			Date_End		$5.
			;
	KEEP	State City Crop Product Save_Date_Start Date_End;
	SET	Resequenced_Crop_Data_Sort;
		BY	State City Crop Product;
	RETAIN	Save_Date_Start;
	IF	FIRST.Product		THEN
		Save_Date_Start	=	Date_Start;
	IF	LAST.Product	THEN
		OUTPUT;
	ELSE
		DELETE;
RUN;

Which produces:

jimbarbour_0-1601419976958.png

 

How does that look?

 

Jim

View solution in original post

16 REPLIES 16
jimbarbour
Meteorite | Level 14

Are your dates truly just character values like

 1/12 

or are they a DATE or DATETIME value?

 

Jim

carloshmcaldas
Fluorite | Level 6
Date, with dd/mm format
jimbarbour
Meteorite | Level 14

@carloshmcaldas,

 

Would it be possible for you to provide me with real dates, either in SAS date format or in a character format that SAS can read like 2020-09-28.  Take a look at the code below.  Can you change it to give me real dates to work with?

 

DATA	Crop_Data;
	INFILE	DATALINES4
			DSD	DLM='-'
			FIRSTOBS=2
			;
	
	INPUT	State		:	$3.
			City		:	$3.
			Crop		:	$3.
			Product		:	$5.
			Week		:	$2.
			Date_Start	:	$5.
			Date_End	:	$5.
			;

DATALINES4;
state - city - crop - product - week - date_start - date_end
st1   - ct1  - cr1  -  pdct1  -   1  -     1/1    -    10/1    
st1   - ct1  - cr1  -  pdct1  -   2  -    11/1    -    20/1    
st1   - ct1  - cr1  -  pdct1  -   3  -    21/1    -    31/1    
st1   - ct1  - cr1  -  pdct1  -  34  -    1/12    -   10/12    
st1   - ct1  - cr1  -  pdct1  -  35  -   11/12    -   20/12    
st1   - ct1  - cr1  -  pdct1  -  36  -   21/12    -   31/12    
st1   - ct1  - cr1  -  pdct2  -   1  -     1/1    -    10/1    
st1   - ct1  - cr1  -  pdct2  -   2  -    11/1    -    20/1    
st1   - ct1  - cr1  -  pdct2  -  35  -   11/12    -   20/12    
st1   - ct1  - cr1  -  pdct2  -  36  -   21/12    -   31/12    
st1   - ct1  - cr1  -  pdct3  -   5  -    11/1    -    20/2    
st1   - ct1  - cr1  -  pdct3  -   6  -    21/1    -    30/2    
st1   - ct1  - cr1  -  pdct3  -   7  -     1/1    -    10/3    
st1   - ct1  - cr1  -  pdct3  -   8  -    11/1    -    20/3  
;;;;
RUN;

 

The above code gives me the below results, but I need real dates in order to try to solve the problem.

 

jimbarbour_0-1601344419722.png

 

Jim

PGStats
Opal | Level 21

Why is date_start not 1/1 for pdct3?

PG
carloshmcaldas
Fluorite | Level 6

Because this crop plantation start in 11/Jan and ends in 20/Mar.

PGStats
Opal | Level 21

So the crop plantation is not the first date_start? What's the rule for seeing that in the data?

PG
carloshmcaldas
Fluorite | Level 6

The crop plantation definition order is the week sequence.

jimbarbour
Meteorite | Level 14

@carloshmcaldas,

 

Here is one approach.  I modified my code to put in the year.  I chose 2020, but you should put in real years of course.  Then, below that, I added some SQL.  The SQL gives us the earliest date_start and the last date_end.  Is this what you need?  I show the results at the very bottom.

DATA	Crop_Data;
	INFILE	DATALINES4
			DSD	DLM='-'
			FIRSTOBS=2
			;

	FORMAT	state 		$3.
			city 		$3.
			crop 		$3.
			product 	$5.
			weeK		$2.
			Date_Start	DATE9.
			Date_End	DATE9.
			;
	
	INPUT	State		:	$3.
			City		:	$3.
			Crop		:	$3.
			Product		:	$5.
			Week		:	$2.
			Date_Start	:	ddmmyy10.
			Date_End	:	ddmmyy10.
			;

DATALINES4;
state - city - crop - product - week - date_start - date_end
st1   - ct1  - cr1  -  pdct1  -   1  -     1/1/2020    -    10/1/2020
st1   - ct1  - cr1  -  pdct1  -   2  -    11/1/2020    -    20/1/2020
st1   - ct1  - cr1  -  pdct1  -   3  -    21/1/2020    -    31/1/2020
st1   - ct1  - cr1  -  pdct1  -  34  -    1/12/2020    -   10/12/2020
st1   - ct1  - cr1  -  pdct1  -  35  -   11/12/2020    -   20/12/2020
st1   - ct1  - cr1  -  pdct1  -  36  -   21/12/2020    -   31/12/2020
st1   - ct1  - cr1  -  pdct2  -   1  -    1/1/2020    -    10/1/2020
st1   - ct1  - cr1  -  pdct2  -   2  -    11/1/2020    -    20/1/2020
st1   - ct1  - cr1  -  pdct2  -  35  -   11/12/2020    -   20/12/2020
st1   - ct1  - cr1  -  pdct2  -  36  -   21/12/2020    -   31/12/2020
st1   - ct1  - cr1  -  pdct3  -   5  -   11/1/2020    -    20/2/2020
st1   - ct1  - cr1  -  pdct3  -   6  -    21/1/2020    -    28/2/2020
st1   - ct1  - cr1  -  pdct3  -   7  -     1/1/2020    -    10/3/2020
st1   - ct1  - cr1  -  pdct3  -   8  -    11/1/2020    -    20/3/2020
;;;;
RUN;

PROC	SQL;
	CREATE	TABLE	CROP_DATA_RANGES	AS
		SELECT	State
				,City
				,Crop
				,Product
				,MIN(Date_Start)		AS	Date_Start	FORMAT	YYMMDDS10.
				,MAX(Date_End)			AS	Date_End	FORMAT	YYMMDDS10.
			FROM	Crop_Data
			GROUP	BY	State
				,City
				,Crop
				,Product
			;
QUIT;

Results:

 

jimbarbour_0-1601346817682.png

 

Jim

jimbarbour
Meteorite | Level 14

@carloshmcaldas,

 

Was that last bit of code helpful?  I think it is what you need (after you add the real years).

 

Jim

carloshmcaldas
Fluorite | Level 6

no, because we can't use a year value, only date and month. the order is defined by week sequence. If the sequence  start at week 1 but it's interrupted, the correct sequence starts in the next order. 

After the week sequence is solved, whe use the date_start from the first row and date_end from the last row.

Product 1 input:

state - city - crop - product - week - date_start - date_end
st1   - ct1  - cr1  -  pdct1  -   1  -     1/1    -    10/1    
st1   - ct1  - cr1  -  pdct1  -   2  -    11/1    -    20/1    
st1   - ct1  - cr1  -  pdct1  -   3  -    21/1    -    31/1    
st1   - ct1  - cr1  -  pdct1  -  34  -    1/12    -   10/12    
st1   - ct1  - cr1  -  pdct1  -  35  -   11/12    -   20/12    
st1   - ct1  - cr1  -  pdct1  -  36  -   21/12    -   31/12    

Product 1 correct week sequence:

 

state - city - crop - product - week - date_start - date_end
st1   - ct1  - cr1  -  pdct1  -  34  -    1/12    -   10/12    
st1   - ct1  - cr1  -  pdct1  -  35  -   11/12    -   20/12    
st1   - ct1  - cr1  -  pdct1  -  36  -   21/12    -   31/12    
st1   - ct1  - cr1  -  pdct1  -   1  -     1/1    -    10/1    
st1   - ct1  - cr1  -  pdct1  -   2  -    11/1    -    20/1    
st1   - ct1  - cr1  -  pdct1  -   3  -    21/1    -    31/1    

Product 1 correct output:

state - city - crop - product - date_start -  date_end 
st1   - ct1  - cr1  -  pdct1  -    1/12    -    31/1    

 

PGStats
Opal | Level 21

Great! But please explain the sequence for Product 3. That one eludes me.

PG
jimbarbour
Meteorite | Level 14

If the full date is present in the database, I would highly recommend that the full date, including year, be pulled from the database.  The date can be formatted any way you like, but the date should be stored as a full numeric SAS date.

 

Let me see what I can do, but I strongly encourage using the full date from the database.

 

Jim

carloshmcaldas
Fluorite | Level 6

As pdct 3 only  has one week sequence, as week 5, 6, 7 and 8, whe use date_start from first row and date_end from last row.

 

st1   - ct1  - cr1  -  pdct3  -   5  -    11/1    -    20/2    
st1   - ct1  - cr1  -  pdct3  -   6  -    21/1    -    30/2    
st1   - ct1  - cr1  -  pdct3  -   7  -     1/1    -    10/3    
st1   - ct1  - cr1  -  pdct3  -   8  -    11/1    -    20/3 

 

jimbarbour
Meteorite | Level 14

OK, I think I understand.

 

I have written the following code.  Please note the comments for each step.

**	Read in the raw data	**;
DATA	Crop_Data;
	INFILE	DATALINES4
			DSD	DLM='-'
			FIRSTOBS=2
			;

	FORMAT	state 		$3.
			city 		$3.
			crop 		$3.
			product 	$5.
			Week		$2.
			Date_Start	$5.
			Date_End	$5.
			;
	
	INPUT	State		:	$3.
			City		:	$3.
			Crop		:	$3.
			Product		:	$5.
			Week		:	$2.
			Date_Start	:	$5.
			Date_End	:	$5.
			;

DATALINES4;
state - city - crop - product - Week - date_start - date_end
st1   - ct1  - cr1  -  pdct1  -   1  -     1/1    -    10/1
st1   - ct1  - cr1  -  pdct1  -   2  -    11/1    -    20/1
st1   - ct1  - cr1  -  pdct1  -   3  -    21/1    -    31/1
st1   - ct1  - cr1  -  pdct1  -  34  -    1/12    -   10/12
st1   - ct1  - cr1  -  pdct1  -  35  -   11/12    -   20/12
st1   - ct1  - cr1  -  pdct1  -  36  -   21/12    -   31/12
st1   - ct1  - cr1  -  pdct2  -   1  -    1/1    -    10/1
st1   - ct1  - cr1  -  pdct2  -   2  -    11/1    -    20/1
st1   - ct1  - cr1  -  pdct2  -  35  -   11/12    -   20/12
st1   - ct1  - cr1  -  pdct2  -  36  -   21/12    -   31/12
st1   - ct1  - cr1  -  pdct3  -   5  -   11/1    -    20/2
st1   - ct1  - cr1  -  pdct3  -   6  -    21/1    -    28/2
st1   - ct1  - cr1  -  pdct3  -   7  -     1/1    -    10/3
st1   - ct1  - cr1  -  pdct3  -   8  -    11/1    -    20/3
;;;;
RUN;

**	Sort by State, City, Crop, Product, and Week	**;
PROC	SORT	DATA=Crop_Data;
	BY	State City Crop Product Week;
RUN;

**	If there is a break in sequence, write records after the break to a second dataset	**;
DATA	Crop_Data1	Crop_Data2;
	FORMAT	state 			$3.
			city 			$3.
			crop 			$3.
			product 		$5.
			Week			$2.
			Date_Start		$5.
			Date_End		$5.
			;

	RETAIN	Prior_Week	0;
	RETAIN	Seq_Break	0;

	SET	Crop_Data;
		BY	State City Crop Product;

	IF	FIRST.Product			THEN	
		DO;
			Seq_Break		=	0;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data1;
			DELETE;
		END;
	
	IF	Seq_Break				THEN
		DO;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data2;
			DELETE;
		END;

	IF	Week				=	Prior_Week + 1	THEN
		DO;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data1;
			DELETE;
		END;
	ELSE
		DO;
			Seq_Break		=	1;
			Prior_Week		=	Week;
			OUTPUT	Crop_Data2;
			DELETE;
		END;
RUN;

**	Recombine the data with records from after the sequence break *first*.	**;
DATA	Resequenced_Crop_Data;
	SET	Crop_Data2	Crop_Data1;
RUN;


**	Sort by State City Crop Product but NOT Week.	**;
PROC	SORT	DATA=Resequenced_Crop_Data	EQUALS
				OUT=Resequenced_Crop_Data_Sort;
	BY	State City Crop Product;
RUN;

**	Now that weeks are in the proper sequence, determine the date range.	**;
DATA	Crop_Data_Summarized(RENAME=(Save_Date_Start=Date_Start));
	FORMAT	state 			$3.
			city 			$3.
			crop 			$3.
			product 		$5.
			Week			$2.
			Save_Date_Start	$5.
			Date_Start		$5.
			Date_End		$5.
			;
	KEEP	State City Crop Product Save_Date_Start Date_End;
	SET	Resequenced_Crop_Data_Sort;
		BY	State City Crop Product;
	RETAIN	Save_Date_Start;
	IF	FIRST.Product		THEN
		Save_Date_Start	=	Date_Start;
	IF	LAST.Product	THEN
		OUTPUT;
	ELSE
		DELETE;
RUN;

Which produces:

jimbarbour_0-1601419976958.png

 

How does that look?

 

Jim

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1221 views
  • 8 likes
  • 3 in conversation