BookmarkSubscribeRSS Feed
CynthiaWei
Obsidian | Level 7

Hi SAS Experts,

I have dataset including dates.

data Have;
	input id $ datestr :$10. Yes_No;
	format date mmddyy10.;
	date = input(datestr!!"-01",yymmdd10.);
	drop datestr;
	datalines;
1	2015-04-01	1
1	2015-04-12	0
1	2015-05-01	0
1	2016-02-14	1
1	2016-07-22	1
2	2015-01-07	0
2	2015-06-25	1
2	2015-07-19	0
2	2015-11-25	1
2	2015-12-24	0
2   2015-12-31  0
2   2016-01-20  0

;
run;

 

I want to create a new table that each record has the date of the observation with Yes_No=1 and then has the following dates of the observations with Yes_No=0 on the same record until next date with Yes_No=1. If there is another record with Yes_No=1 for the same ID, then a new record generated, and so on. Within each ID, every record has the date when Yes_No=1 followed with the dates Yes-No=0. Only special is that if first observation is a Yes_No=0, then leave it there.

 

The dataset I want: (the variables following_date_1-3 can be created as needed)

ID Yes_NO date following_date_1 following_date_2 following_date_3

1 1 4/1/2015      4/12/2015       5/1/2015

1 1 2/1/2016

1 1 7/1/2016

2 0 1/1/2015

2 1 6/24/2015      7/2/2015

2 1 11/25/2015    12/24/2015     12/31/2015     1/20/2016

 

Thank you so much for any help!

Best regards,

C

7 REPLIES 7
jimbarbour
Meteorite | Level 14

Hi, @CynthiaWei,

 

Are you trying to set all your dates to the first of the month?  In order to do that, I believe you would need to change the Informat from $10. to $7.  Also, your data has tabs in it, so for it to be read correctly, I had to add an INFILE with DSD DLM='09'x.  See below.

data Have;
	INFILE	DATALINES	DSD	DLM='09'X	MISSOVER;
	input id $ datestr :$7. Yes_No;
	format date mmddyy10.;
	date = input(datestr||"-01",yymmdd10.);
	drop datestr;
	datalines;
1	2015-04-01	1
1	2015-04-12	0
1	2015-05-01	0
1	2016-02-14	1
1	2016-07-22	1
2	2015-01-07	0
2	2015-06-25	1
2	2015-07-19	0
2	2015-11-25	1
2	2015-12-24	0
2	2015-12-31	0
2	2016-01-20	0
;
run;
ballardw
Super User

One way:

data need;
   set have;
   retain group;
 
   if Yes_no=1 then   group+1;
run;

proc transpose data=need out=want (drop=group _name_)
   prefix=following_date   
;
   by id group;
   var date;
run;

 

CynthiaWei
Obsidian | Level 7

Thank you so much Ballardw!

 

Is there a way that the value of group starts from 1 for each ID when Yes_No=1. If an ID starts from Yes_No=0 then group starts from 0 and then 1, 2, 3, and so on (like showed below). What we have right now is the value of group is continuous across all IDs.

 

id Yes_No date group
1 1 4/1/2015 1
1 0 4/12/2015 1
1 0 5/1/2015 1
1 1 2/14/2016 2
1 1 7/22/2016 3
2 0 1/7/2015 0
2 1 6/25/2015 1
2 0 7/19/2015 1
2 1 11/25/2015 2
2 0 12/24/2015 2
2 0 12/31/2015 2
2 0 1/20/2016 2

 

Thank you!

best regards,

 

C

ballardw
Super User
data need;
   set have;
   by id;
   retain group;
   if first.id then group=1;
   else if  Yes_no=1 then   group+1;
run;
CynthiaWei
Obsidian | Level 7

Hi Ballardw,

 

Really appreciate your code. It worked very well to meet my analysis requirement.

 

My next step is to create a another table with different date layout.

 

I am having the dataset again:

data Have;
	input id $ datestr :$10. Yes_No;
	format date mmddyy10.;
	date = input(datestr!!"-01",yymmdd10.);
	drop datestr;
	datalines;
1 2015-04-01 1
1 2015-04-12 0
1 2015-05-01 0
1 2016-02-01 1
1 2016-07-01 1
2 2015-01-01 0
2 2015-06-24 1
2 2015-07-02 0
2 2015-11-25 1
2 2015-12-24 0
2 2015-12-31 0
2 2016-01-20 0

;
run;

And, what I want is to create a table that each row represents a month and have all the observations happened in that month, regardless Yes_No=1 or 0, but still have the variable Yes_No in the final dataset. The value of Yes_No for each row is the value for the first observation of that month (i.e. Yes_No=1 for 4/1/2015, Yes_No=0 for 12/24/2015). If there are two or more observations in one month then have them in the same row, just like the table below:

ID Yes_No month dispense_date dispense_date_of_following_1 dispense_date_of_following_2

1 1 2015-04 4/1/2015 4/12/2015

1 0 2015-05 5/1/2015

1 1 2016-02 2/1/2016

1 1 2016-07 7/1/2016

2 0 2015-01 1/1/2015

2 1 2015-06 6/24/2015

2 0 2015-07 7/2/2015

2 1 2015-11 11/25/2015

2 0 2015-12 12/24/2015 12/31/2015

2 0 2016-01 1/20/2016

 

Thank you very much again for the help!!!

Best regards,

C

jimbarbour
Meteorite | Level 14

Cynthia,

 

It's still a little unclear exactly what you want, but I think you want the primary observations, i.e. the observations with Yes_No = 1, to have a date that is the first of the month.  I see the first of the month used in the first four examples of what you want, but the last two are not like that, so I'm not completely sure.  

jimbarbour_0-1623444933359.png

Assuming that you want the Date on the primary observations to fall on the first of the month, here's some code that will give you what you want.  It's fairly complex.  If @ballardw's code gives you exactly what you want, by all means go with that.

data Have;
	INFILE	DATALINES	DSD	DLM='09'X	MISSOVER;
	input id $ datestr :$10. Yes_No;
	format date mmddyy10.;
*	date = input(datestr||"-01",yymmdd10.);
	date = input(datestr,yymmdd10.);
	drop datestr;

	datalines;
1	2015-04-01	1
1	2015-04-12	0
1	2015-05-01	0
1	2016-02-14	1
1	2016-07-22	1
2	2015-01-07	0
2	2015-06-25	1
2	2015-07-19	0
2	2015-11-25	1
2	2015-12-24	0
2	2015-12-31	0
2	2016-01-20	0
;
run;

PROC	SORT	DATA=Have;
	BY ID	Date	DESCENDING Yes_No;
RUN;

PROC	SQL		NOPRINT;
	SELECT	STRIP(PUT(MAX(Date_Cnt), 8.))
		INTO	:	Max_Date_Cols
		FROM
			(SELECT	COUNT(1)	AS	Date_Cnt
				FROM	Have
				GROUP	BY	ID)
				;
QUIT;
%LET	Max_Date_Cols	=	%EVAL(&Max_Date_Cols - 1);
%PUT	NOTE:  &=Max_Date_Cols;

DATA	Have_Grouped;
	LENGTH	_Group	8;

	SET	Have;
		BY	ID;

	IF	First.ID	AND
		NOT	Yes_No	THEN
		_Group	+	1;
	ELSE
	IF	Yes_No	THEN
		_Group	+	1;

RUN;

%MACRO	Init_Dates;
	%DO	i			=	1	%TO	&Max_Date_Cols;
		CALL	MISSING(Following_Date_&i);
	%END;
%MEND	Init_Dates;

DATA	Have_Tpose;
	DROP	_:;
	RETAIN	_First_Time		1;
	LENGTH	_Temp_Date		8;
	LENGTH	_Temp_Yes_No	3;
	FORMAT	_Temp_Date		MMDDYYS10.;

	SET	Have_Grouped	END		=	_No_More_Data;
		BY	_Group;

	FORMAT	Following_Date_1 	-	Following_Date_&Max_Date_Cols	mmddyy10.;
	ARRAY	Dates {*}		8.		Following_Date_1 - Following_Date_&Max_Date_Cols;

	IF	_First_Time	THEN
		DO;
			_First_Time			=	0;
			%Init_Dates;
			DECLARE	HASH			h_Holding_Table(MULTIDATA: 'Y', ORDERED: 'A');
									h_Holding_Table.DEFINEKEY('ID',		'_Temp_Date');
									h_Holding_Table.DEFINEDATA('ID',	'_Temp_Date',	'_Temp_Yes_No');
									h_Holding_Table.DEFINEDONE();
			DECLARE HITER			Iterate('h_Holding_Table');
		END;

	IF	FIRST._Group				AND
		NOT	Yes_No					THEN
		DO;
			Date				=	MDY(MONTH(Date), 1, YEAR(Date));
			OUTPUT;
			DELETE;
		END;

	IF	LAST._Group					THEN
		DO;
			_Temp_Date			=	Date;
			_Temp_Yes_No		=	Yes_No;
			h_Holding_Table.ADD();
			_i					=	0;
			_RC					=	0;
			_RC					=	Iterate.FIRST();
			DO	UNTIL	(_RC);
				_i				+	1;
				IF	_Temp_Yes_No	THEN
					DO;
						_i		+	-1;
						Date	=	MDY(MONTH(_Temp_Date), 1, YEAR(_Temp_Date));
					END;
				ELSE
					Dates{_i}	=	_Temp_Date;
				_RC				=	Iterate.NEXT();
			END;
			Yes_No				=	1;
			OUTPUT;
			_RC					=	Iterate.LAST();
			_RC					=	Iterate.NEXT();
			h_Holding_Table.CLEAR();
		END;
	ELSE
		DO;
			_Temp_Date			=	Date;
			_Temp_Yes_No		=	Yes_No;
			h_Holding_Table.ADD();
		END;
RUN;

Jim

CynthiaWei
Obsidian | Level 7

Hi Jim,

 

Thank you so much for working on my SAS question!

 

I am sorry about the confusion. The texts you highlighted are typos.

 

Let's think the dates are the dates of classes. The value of 1 for Yes_No means the class was taught by a profession, and Yes_No=1 means the classes were taught by a teaching assistant (TA). This is because professions (Yes_No=1) would teach the class only when there are new knowledge that was going to introduce. If the classes were planned to review or practice on the previously introduced knowledge, then instead of having a professor, a teaching assistant (Yes_No=0) would show up. So, what I want is to have each record for a unique knowledge. For example, ID=1 had Excel class introduced by a professor on 2015-04-01 and other two Excel practice classes taught by a TA on 2015-04-12 and 2015_05-01. Then ID=1 had a SAS class (new class) introduced by a professor on 2016-02-14 and a Python class (another new class) on 2016-07-22. Please be aware that not every knowledge has to be taught first by a professor and then by a teaching assistant, it could be only set for one class, and the class was taught by a teaching assistant but a professor, just like the first observation for ID=2, date=2015-01-07 and Yes_No=0. Within same ID, once there were a obs with a value of 1 for Yes_No and followed by one or more obs with Yes_No=0, then these obs should be in the same record in the want dataset. If the first obs within an ID has a value of Yes_No=0, then this record stand as its own.

 

Thank you very much again!

Regards,

Cynthia

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 853 views
  • 0 likes
  • 3 in conversation