Hi SAS Experts,
I have dataset including dates.
data Have;
input id $ datestr :$10. Yes_No;
format date mmddyy10.;
date = input(datestr!!"-01",yymmdd10.);
drop datestr;
datalines;
1 2015-04-01 1
1 2015-04-12 0
1 2015-05-01 0
1 2016-02-14 1
1 2016-07-22 1
2 2015-01-07 0
2 2015-06-25 1
2 2015-07-19 0
2 2015-11-25 1
2 2015-12-24 0
2 2015-12-31 0
2 2016-01-20 0
;
run;
I want to create a new table that each record has the date of the observation with Yes_No=1 and then has the following dates of the observations with Yes_No=0 on the same record until next date with Yes_No=1. If there is another record with Yes_No=1 for the same ID, then a new record generated, and so on. Within each ID, every record has the date when Yes_No=1 followed with the dates Yes-No=0. Only special is that if first observation is a Yes_No=0, then leave it there.
The dataset I want: (the variables following_date_1-3 can be created as needed)
ID Yes_NO date following_date_1 following_date_2 following_date_3
1 1 4/1/2015 4/12/2015 5/1/2015
1 1 2/1/2016
1 1 7/1/2016
2 0 1/1/2015
2 1 6/24/2015 7/2/2015
2 1 11/25/2015 12/24/2015 12/31/2015 1/20/2016
Thank you so much for any help!
Best regards,
C
Hi, @CynthiaWei,
Are you trying to set all your dates to the first of the month? In order to do that, I believe you would need to change the Informat from $10. to $7. Also, your data has tabs in it, so for it to be read correctly, I had to add an INFILE with DSD DLM='09'x. See below.
data Have;
INFILE DATALINES DSD DLM='09'X MISSOVER;
input id $ datestr :$7. Yes_No;
format date mmddyy10.;
date = input(datestr||"-01",yymmdd10.);
drop datestr;
datalines;
1 2015-04-01 1
1 2015-04-12 0
1 2015-05-01 0
1 2016-02-14 1
1 2016-07-22 1
2 2015-01-07 0
2 2015-06-25 1
2 2015-07-19 0
2 2015-11-25 1
2 2015-12-24 0
2 2015-12-31 0
2 2016-01-20 0
;
run;
One way:
data need; set have; retain group; if Yes_no=1 then group+1; run; proc transpose data=need out=want (drop=group _name_) prefix=following_date ; by id group; var date; run;
Thank you so much Ballardw!
Is there a way that the value of group starts from 1 for each ID when Yes_No=1. If an ID starts from Yes_No=0 then group starts from 0 and then 1, 2, 3, and so on (like showed below). What we have right now is the value of group is continuous across all IDs.
id | Yes_No | date | group |
1 | 1 | 4/1/2015 | 1 |
1 | 0 | 4/12/2015 | 1 |
1 | 0 | 5/1/2015 | 1 |
1 | 1 | 2/14/2016 | 2 |
1 | 1 | 7/22/2016 | 3 |
2 | 0 | 1/7/2015 | 0 |
2 | 1 | 6/25/2015 | 1 |
2 | 0 | 7/19/2015 | 1 |
2 | 1 | 11/25/2015 | 2 |
2 | 0 | 12/24/2015 | 2 |
2 | 0 | 12/31/2015 | 2 |
2 | 0 | 1/20/2016 | 2 |
Thank you!
best regards,
C
data need; set have; by id; retain group; if first.id then group=1; else if Yes_no=1 then group+1; run;
Hi Ballardw,
Really appreciate your code. It worked very well to meet my analysis requirement.
My next step is to create a another table with different date layout.
I am having the dataset again:
data Have;
input id $ datestr :$10. Yes_No;
format date mmddyy10.;
date = input(datestr!!"-01",yymmdd10.);
drop datestr;
datalines;
1 2015-04-01 1
1 2015-04-12 0
1 2015-05-01 0
1 2016-02-01 1
1 2016-07-01 1
2 2015-01-01 0
2 2015-06-24 1
2 2015-07-02 0
2 2015-11-25 1
2 2015-12-24 0
2 2015-12-31 0
2 2016-01-20 0
;
run;
And, what I want is to create a table that each row represents a month and have all the observations happened in that month, regardless Yes_No=1 or 0, but still have the variable Yes_No in the final dataset. The value of Yes_No for each row is the value for the first observation of that month (i.e. Yes_No=1 for 4/1/2015, Yes_No=0 for 12/24/2015). If there are two or more observations in one month then have them in the same row, just like the table below:
ID Yes_No month dispense_date dispense_date_of_following_1 dispense_date_of_following_2
1 1 2015-04 4/1/2015 4/12/2015
1 0 2015-05 5/1/2015
1 1 2016-02 2/1/2016
1 1 2016-07 7/1/2016
2 0 2015-01 1/1/2015
2 1 2015-06 6/24/2015
2 0 2015-07 7/2/2015
2 1 2015-11 11/25/2015
2 0 2015-12 12/24/2015 12/31/2015
2 0 2016-01 1/20/2016
Thank you very much again for the help!!!
Best regards,
C
Cynthia,
It's still a little unclear exactly what you want, but I think you want the primary observations, i.e. the observations with Yes_No = 1, to have a date that is the first of the month. I see the first of the month used in the first four examples of what you want, but the last two are not like that, so I'm not completely sure.
Assuming that you want the Date on the primary observations to fall on the first of the month, here's some code that will give you what you want. It's fairly complex. If @ballardw's code gives you exactly what you want, by all means go with that.
data Have;
INFILE DATALINES DSD DLM='09'X MISSOVER;
input id $ datestr :$10. Yes_No;
format date mmddyy10.;
* date = input(datestr||"-01",yymmdd10.);
date = input(datestr,yymmdd10.);
drop datestr;
datalines;
1 2015-04-01 1
1 2015-04-12 0
1 2015-05-01 0
1 2016-02-14 1
1 2016-07-22 1
2 2015-01-07 0
2 2015-06-25 1
2 2015-07-19 0
2 2015-11-25 1
2 2015-12-24 0
2 2015-12-31 0
2 2016-01-20 0
;
run;
PROC SORT DATA=Have;
BY ID Date DESCENDING Yes_No;
RUN;
PROC SQL NOPRINT;
SELECT STRIP(PUT(MAX(Date_Cnt), 8.))
INTO : Max_Date_Cols
FROM
(SELECT COUNT(1) AS Date_Cnt
FROM Have
GROUP BY ID)
;
QUIT;
%LET Max_Date_Cols = %EVAL(&Max_Date_Cols - 1);
%PUT NOTE: &=Max_Date_Cols;
DATA Have_Grouped;
LENGTH _Group 8;
SET Have;
BY ID;
IF First.ID AND
NOT Yes_No THEN
_Group + 1;
ELSE
IF Yes_No THEN
_Group + 1;
RUN;
%MACRO Init_Dates;
%DO i = 1 %TO &Max_Date_Cols;
CALL MISSING(Following_Date_&i);
%END;
%MEND Init_Dates;
DATA Have_Tpose;
DROP _:;
RETAIN _First_Time 1;
LENGTH _Temp_Date 8;
LENGTH _Temp_Yes_No 3;
FORMAT _Temp_Date MMDDYYS10.;
SET Have_Grouped END = _No_More_Data;
BY _Group;
FORMAT Following_Date_1 - Following_Date_&Max_Date_Cols mmddyy10.;
ARRAY Dates {*} 8. Following_Date_1 - Following_Date_&Max_Date_Cols;
IF _First_Time THEN
DO;
_First_Time = 0;
%Init_Dates;
DECLARE HASH h_Holding_Table(MULTIDATA: 'Y', ORDERED: 'A');
h_Holding_Table.DEFINEKEY('ID', '_Temp_Date');
h_Holding_Table.DEFINEDATA('ID', '_Temp_Date', '_Temp_Yes_No');
h_Holding_Table.DEFINEDONE();
DECLARE HITER Iterate('h_Holding_Table');
END;
IF FIRST._Group AND
NOT Yes_No THEN
DO;
Date = MDY(MONTH(Date), 1, YEAR(Date));
OUTPUT;
DELETE;
END;
IF LAST._Group THEN
DO;
_Temp_Date = Date;
_Temp_Yes_No = Yes_No;
h_Holding_Table.ADD();
_i = 0;
_RC = 0;
_RC = Iterate.FIRST();
DO UNTIL (_RC);
_i + 1;
IF _Temp_Yes_No THEN
DO;
_i + -1;
Date = MDY(MONTH(_Temp_Date), 1, YEAR(_Temp_Date));
END;
ELSE
Dates{_i} = _Temp_Date;
_RC = Iterate.NEXT();
END;
Yes_No = 1;
OUTPUT;
_RC = Iterate.LAST();
_RC = Iterate.NEXT();
h_Holding_Table.CLEAR();
END;
ELSE
DO;
_Temp_Date = Date;
_Temp_Yes_No = Yes_No;
h_Holding_Table.ADD();
END;
RUN;
Jim
Hi Jim,
Thank you so much for working on my SAS question!
I am sorry about the confusion. The texts you highlighted are typos.
Let's think the dates are the dates of classes. The value of 1 for Yes_No means the class was taught by a profession, and Yes_No=1 means the classes were taught by a teaching assistant (TA). This is because professions (Yes_No=1) would teach the class only when there are new knowledge that was going to introduce. If the classes were planned to review or practice on the previously introduced knowledge, then instead of having a professor, a teaching assistant (Yes_No=0) would show up. So, what I want is to have each record for a unique knowledge. For example, ID=1 had Excel class introduced by a professor on 2015-04-01 and other two Excel practice classes taught by a TA on 2015-04-12 and 2015_05-01. Then ID=1 had a SAS class (new class) introduced by a professor on 2016-02-14 and a Python class (another new class) on 2016-07-22. Please be aware that not every knowledge has to be taught first by a professor and then by a teaching assistant, it could be only set for one class, and the class was taught by a teaching assistant but a professor, just like the first observation for ID=2, date=2015-01-07 and Yes_No=0. Within same ID, once there were a obs with a value of 1 for Yes_No and followed by one or more obs with Yes_No=0, then these obs should be in the same record in the want dataset. If the first obs within an ID has a value of Yes_No=0, then this record stand as its own.
Thank you very much again!
Regards,
Cynthia
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.