Hi everyone,
I have a data set with subjects and their various visits. Some of the visits have a bin/batch date under which previous visits fall (e.g., subject ID 4 has 4 batches).
My goal is to create a data set with variables that list the various visits in their respective batches.
Obviously, I'm a SAS beginner and have hammered together some code that only partially works. I'd be very grateful for your help on getting this code working! Thank you so much in advance!
Here's a screenshot of Have and Want:
In Want, I'd like there to be a 'Yes' only if a visit falls in a certain batch (my code currently can't do that). I'd also like for the batch dates to be carried through (as opposed to the 'Yes') but only for visits that were part of a batch. In addition, there should be a variable called 'Batch?' that simply says Yes or No depending on whether a visit is part of a batch or not.
Here's my try at it:
data have; infile datalines delimiter=' '; input ID Visit $ Date date11. Binday$; datalines; 1 Week3 21-Dec-2018 No 1 Week6 19-Feb-2019 No 1 Week10 14-May-2019 No 1 Week14 06-Aug-2019 No 1 Week18 30-Oct-2019 No 1 Week22 28-Jan-2020 No 2 Week3 25-Jan-2019 No 2 Week6 26-Mar-2019 No 2 Week10 19-Jun-2019 No 2 Week14 10-Sep-2019 No 2 Week18 04-Dec-2019 Yes 3 Week3 30-Jan-2019 Yes 4 Week3 04-Oct-2017 No 4 Week6 06-Dec-2017 No 4 Week10 28-Feb-2018 Yes 4 Week14 23-May-2018 No 4 Week18 14-Aug-2018 No 4 Week22 07-Nov-2018 Yes 4 Week26 30-Jan-2019 No 4 Week30 24-Apr-2019 No 4 Week34 17-Jul-2019 No 4 Week38 09-Oct-2019 Yes 4 Week40 02-Jan-2020 No 4 Week42 25-Mar-2020 No 4 Week44 17-Jun-2020 No 4 Week46 09-Sep-2020 Yes 4 Week48 03-Dec-2020 No 4 Week50 24-Feb-2021 No 5 Week8 16-Mar-2018 Yes 5 Week11 07-May-2018 No 5 Week14 10-Jul-2018 Yes ; run; /* Create an enumeration variable of changes in Binday for each ID */ data work.want1; do _n_=0 by 1 until(last.ID); do until(last.Binday); set work.have; by ID Binday notsorted; Change=_n_; output; end; end; run; /* Translate changes in Binday to batch numbers */ /* First, remove all No visits */ data work.want2; set work.want1; if Binday ne 'No'; run; /* Second, enumerate again through the Yes visits */ data work.want3; do _n_=1 by 1 until(last.ID); do until(last.Change); set work.want2; by ID Change notsorted; NewChange=_n_; output; end; end; run; /* Create a new batch variable for the transpose below */ data work.want4; set work.want3; Batchdate = CAT('Batchdate', NewChange); run; /* Transpose to get the batch variables in wide format */ proc transpose data=work.want4 out=work.want5(drop=_name_ _label_); by ID; id Batchdate; var Date; run; /* Add the batch dates to the original data set */ proc sql; create table work.have_All as select t1.ID, t1.Visit, t1.Date, t1.Binday, t1.Change, t2.* from work.want1 t1 left join work.want5 t2 on (t1.ID = t2.ID); quit; /* Get all the variable names in the dataset */ proc sql /* noprint */; create table work.varnames as select Name into : vars separated by ' ' from dictionary.columns where libname = upcase('work') and memname = upcase('have_All'); quit; /* Isolate the Batch variable names */ proc sql; create table work.bnames as select distinct t1.Name from work.varnames t1 where Name like 'Batch%'; quit; /* Put those names into a str to be used in the macro below */ proc sql noprint; select Name into :string separated by " " from work.bnames; %let bname = %str(&string); quit; %put &bname; /* Trying to create a macro...but failing */ %macro test(); %let mlength=%sysfunc(countw(&bname)); /*%put &mlength;*/ data work.testresults(drop=i); set work.have_All; array restv[1] Visit; array changev[2] ID Change; array vdates[1] Date; array bdates[&mlength] &bname; array Batch[&mlength] $22; do i=1 to dim(Batch); if vdates[1] <= bdates[i] then Batch[i] = 'Yes'; /* Would prefer Batch[i] = 'Batch[i]' but SAS says No */ /* I've tried a couple of fun things here but they didn't work */ if bdates[i] = '.' then Batch[i] = ''; end; run; /* Change the date format back for the individual variables...but this needs work */ /* data work.results;*/ /* set work.testresults;*/ /* do i=1 to &mlength;*/ /* bname_new[i] = put(&bname, date11.);*/ /* run;*/ /*end;*/ %mend; %test; /*Rename and format some variables - this should be part of the macro (or be a new macro?) so it's also automated */ data work.results_int(drop=Date Change Batchdate1a Batchdate2a Batchdate3a Batchdate4a); set work.testresults(rename=(Batchdate1=Batchdate1a Batchdate2=Batchdate2a Batchdate3=Batchdate3a Batchdate4=Batchdate4a)); VisitDate = put(Date, date11.); Batchdate1 = put(Batchdate1a, date11.); Batchdate2 = put(Batchdate2a, date11.); Batchdate3 = put(Batchdate3a, date11.); Batchdate4 = put(Batchdate4a, date11.); run; proc sort data=results_int sortseq=linguistic(NUMERIC_COLLATION=on); by ID Visit; run; /* Mark visits whether they're part of a batch or not ... which for some reason doesn't work?! Plus, the results aren't ready yet so this variable wouldn't be correct currently. */ data work.results; set work.results_int; if Batchdate1 ne '' then 'Batch?'n = 'Yes'; else 'Batch?'n = 'No'; run;
You seem to mixed up different approaches into one program. Stick to one approach at a time.
So to eliminate the "bin" value from the last group you will need to separated the counting from the actual variable you want to use.
So number the periods or groups in the data and keep track of the last record per group.
data step1 groups(keep=id group date binday rename=(binday=groupday date=groupdate));
set have;
by id date ;
if first.id then group=1;
output step1;
if binday='Yes' or last.id then do;
output groups;
group+1;
end;
run;
Then re-merge then so you can count how many of the groups you want to consider a BIN. And when that last binday value that was saved in the GROUPDAY variable is Yes then use the current BIN number as the name for you BIN.
data step2;
merge step1 groups;
by id group;
if first.id then binnumber=0;
binnumber + (first.group and groupday='Yes');
if groupday='Yes' then bin=binnumber;
run;
Results:
proc print;
by id;
run;
Before we go down this path of turning a long data set into a wide data set (usually not a good idea), please tell us what you can do with the WANT data set that you can't do with the HAVE data set. What is the benefit of doing such a transformation? Most (all?) SAS analytical PROCs require the long (HAVE) data set, rather than the wide (WANT) data set. Even if you want a report or an Excel file that looks like the WANT data set, you can probably do this via a long data set and PROC REPORT.
Hi Paige,
the data will be joined to another data set with one record per subject and information of two reviewers in wide format. Some of the reviewer information will be used only when they fall within certain batches and programming in wide-format is easier for my current skill (and SAS-think) level.
How would you approach getting the Batch1 - Batch4 information that I put next to the Have dataset (column PC)? You could find Bindate = 'Yes' for an ID...but what if you have multiple Bindate = 'Yes'? How to bin dates across rows?
I'm curious - why is it not a good idea to go into wide format?
Thank you!
Nic
@sayno2s wrote:
the data will be joined to another data set with one record per subject and information of two reviewers in wide format. Some of the reviewer information will be used only when they fall within certain batches and programming in wide-format is easier for my current skill (and SAS-think) level.
I'm thinking that this is not enough information. When you have done this join, then what?
I'm curious - why is it not a good idea to go into wide format?
As I said, almost all (or maybe all) SAS analysis PROCs and reporting PROCs want data in the long format, not the wide format. This is a common mistake some people make, thinking that the wide format is better and then they spend a huge amount of work, when it simply isn't necessary. So, while there are exceptions, some situations are better in a wide format, as you can see getting the data into a wide format will require a lot of work, and I want to be sure that the large amount of work is really necessary.
Also, I think very few people here (including me) will try to write any code from your screen capture of Excel. You need to provide the data after it has been read into SAS, as working SAS data step code (not screen capture or any other format), which you can type in yourself, or use this macro.
Your description of how to determine the bins if not very clear.
I kind of looks like the YES values mark the END of a bin. Is that right?
If so you can number the bins with a data step like this. In SAS we would call this a GROUP rather than a BIN. As in BY GROUP processing.
data step1;
set have;
by id date ;
if first.id then group=1;
output;
if binday='Yes' then group+1;
run;
You have made things more complicated by putting the information about whether or not the "bin" is a real bin or not at the END of the data instead of the start. It is much easier to REMEMBER something than to PREDICT THE FUTURE.
But you could just write that last observation out to a separate file and re-merge it to get the information back onto every observation in the group. You will need to set a fixed upper bound on the number of BINDATE variables you want create (or add some extra code to find the maximum value and put it into a macro variable so you can generate that number dynamically)
data step1 groups(keep=id group date binday rename=(binday=groupday date=groupdate));
set have;
by id date ;
if first.id then group=1;
output step1;
if binday='Yes' or last.id then do;
output groups;
group+1;
end;
run;
data want;
do until(last.id);
merge step1 groups;
by id group;
if group=1 then belongs=groupday;
array bins bindate1-bindate6;
bins[group] = groupdate ;
output;
end;
drop groupday groupdate;
format bindate: date9.;
run;
Photograph of Result:
Your example data step seems over complicated. There is no need to add a RUN: statement after the end of the data step. Data steps with in-line data end with the data. You don't have to tell SAS to use a space a delimiter, since that is the default.
data have;
input ID Visit $ Date :date. Binday $;
format date date9.;
datalines;
1 Week3 21-Dec-2018 No
1 Week6 19-Feb-2019 No
1 Week10 14-May-2019 No
1 Week14 06-Aug-2019 No
1 Week18 30-Oct-2019 No
1 Week22 28-Jan-2020 No
2 Week3 25-Jan-2019 No
2 Week6 26-Mar-2019 No
2 Week10 19-Jun-2019 No
2 Week14 10-Sep-2019 No
2 Week18 04-Dec-2019 Yes
3 Week3 30-Jan-2019 Yes
4 Week3 04-Oct-2017 No
4 Week6 06-Dec-2017 No
4 Week10 28-Feb-2018 Yes
4 Week14 23-May-2018 No
4 Week18 14-Aug-2018 No
4 Week22 07-Nov-2018 Yes
4 Week26 30-Jan-2019 No
4 Week30 24-Apr-2019 No
4 Week34 17-Jul-2019 No
4 Week38 09-Oct-2019 Yes
4 Week40 02-Jan-2020 No
4 Week42 25-Mar-2020 No
4 Week44 17-Jun-2020 No
4 Week46 09-Sep-2020 Yes
4 Week48 03-Dec-2020 No
4 Week50 24-Feb-2021 No
5 Week8 16-Mar-2018 Yes
5 Week11 07-May-2018 No
5 Week14 10-Jul-2018 Yes
;
Hi Tom,
Thank you so much for your help, that's almost perfect!
There's one thing not quite there yet that I've tried to solve but couldn't. It's that records where there is no final 'Yes' bin date (to make it a bin) should also be a 'No' in 'belongs' (in red in the screenshot) as those records will be excluded in some calculations. I've tried to use '0' as start for the groups in the first two steps (or 'if groupday = 'No' then group = 0' in the groups dataset), but then the merge doesn't work. What to do?
I figured out how to make the bindates dynamic:
/* I added this proc sql statement and then used the macro variable in your next data step */
proc sql noprint; select max(group) into :maxbin from work.groups; quit; %let maxbin = &maxbin; %put &=maxbin; /* Bindate6 is hard-coded */ data want; do until(last.id); merge step1 groups; by id group; if group=1 then belongs=groupday; /* array bins bindate1-bindate6;*/ array bins bindate1-bindate&maxbin; bins[group] = groupdate ; output; end; drop groupday groupdate; format bindate: date9.; run;
I knew that my pretzel code could be cut down and improved a lot...it's fun to see!
Lots to learn, thank you, and thank you for the Spoiler too!
Hi Tom,
I thought I could start with the last binday, and if that is 'No' then 'belongs' = 0, and do that until the first instance of binday = 'Yes'. However, that's not working.
data want1; set want; do until(last.Binday='Yes'); if last.Binday eq 'No' then belongs=0; by id; end; run;
"That's not working" is awful vague.
Are there errors in the log?: Post the code and log in a code box opened with the "</>" to maintain formatting of error messages.
No output? Post any log in a code box.
Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the "</>" icon or attached as text to show exactly what you have and that we can test code against.
The values of the LAST. and FIRST. automatic variables are 1 and 0. Not 'Yes' or 'No'. To test if true then just use the variable:
if last.variablename then ...
If you want when the variable is not true then use NOT
if not (last.variablename) then
You also need to have a BY statement using the variable to create the First. or Last. variables. You do not have Bindate on any BY statement.
All (old and new) code and data and expected output are in the messages above.
For easier testing, here's the new code again:
data have; input ID Visit $ Date :date. Binday $; format date date9.; datalines; 1 Week3 21-Dec-2018 No 1 Week6 19-Feb-2019 No 1 Week10 14-May-2019 No 1 Week14 06-Aug-2019 No 1 Week18 30-Oct-2019 No 1 Week22 28-Jan-2020 No 2 Week3 25-Jan-2019 No 2 Week6 26-Mar-2019 No 2 Week10 19-Jun-2019 No 2 Week14 10-Sep-2019 No 2 Week18 04-Dec-2019 Yes 3 Week3 30-Jan-2019 Yes 4 Week3 04-Oct-2017 No 4 Week6 06-Dec-2017 No 4 Week10 28-Feb-2018 Yes 4 Week14 23-May-2018 No 4 Week18 14-Aug-2018 No 4 Week22 07-Nov-2018 Yes 4 Week26 30-Jan-2019 No 4 Week30 24-Apr-2019 No 4 Week34 17-Jul-2019 No 4 Week38 09-Oct-2019 Yes 4 Week40 02-Jan-2020 No 4 Week42 25-Mar-2020 No 4 Week44 17-Jun-2020 No 4 Week46 09-Sep-2020 Yes 4 Week48 03-Dec-2020 No 4 Week50 24-Feb-2021 No 5 Week8 16-Mar-2018 Yes 5 Week11 07-May-2018 No 5 Week14 10-Jul-2018 Yes ; data step1; set have; by id date ; if first.id then group=1; output; if binday='Yes' then group+1; run; data step1 groups(keep=id group date binday rename=(binday=groupday date=groupdate)); set have; by id date ; if first.id then group=1; output step1; if binday='Yes' or last.id then do; output groups; group+1; end; run; proc sql noprint; select max(group) into :maxbin from work.groups; quit; %let maxbin = &maxbin; %put &=maxbin; data want; do until(last.id); merge step1 groups; by id group; if group=1 then belongs=groupday; array bins bindate1-bindate&maxbin; bins[group] = groupdate ; output; end; drop groupday groupdate; format bindate: date9.; run;
In want, I want to replace values in 'belongs' to 'No' (and ideally in 'group' to 0) if the last records (and only those last ones) for an id are 'Binday' = 'No' until 'Binday' = 'Yes' (see above screenshot) as those records don't belong to a bin.
I've tried this (updated from previous message; and other things) but the code just keeps running on without results:
data want1; set want; if Binday='Yes' then tvar = 1; if Binday='No' then tvar = 0; run; data want2; set want1; do until(last.tvar=1); if last.tvar eq 0 then test = 2; by id tvar; end; run;
I guess first/last variables can't be assigned a value. How can I set a value (i.e., 'Binday' = 'No' ) and then move until the value for that variable changes the first time (i.e., 'Binday' = 'Yes') and make a change to values in another variable (i.e., belongs/group)?
This is the log:
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
25
26 GOPTIONS ACCESSIBLE;
27 data want2;
28 set want1;
29 do until(last.tvar=1);
30 if last.tvar eq 0 then test = 2;
31 by id tvar;
32 end;
33 run;
NOTE: The DATA step has been abnormally terminated.
NOTE: There were 2 observations read from the data set WORK.WANT1.
WARNING: The data set WORK.WANT2 may be incomplete. When this step was stopped there were 0 observations and 13 variables.
WARNING: Data set WORK.WANT2 was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 1:48.35
cpu time 1:47.93
You coded an infinite loop.
data want2;
set want1;
do until(last.tvar=1);
if last.tvar eq 0 then test = 2;
by id tvar;
end;
run;
You said to run until some condition but inside the loop there are no statements that could change the value of that condition. For a FIRST. flag variable to change you have to execute the SET/MERGE/UPDATE statement that the BY statement is associated with.
You referenced a variable (in this case the temporary flag variable FIRST.TVAR) that does not exist. FIRST. and LAST. flags are only created for the variables that are listed in the BY statement.
You seem to mixed up different approaches into one program. Stick to one approach at a time.
So to eliminate the "bin" value from the last group you will need to separated the counting from the actual variable you want to use.
So number the periods or groups in the data and keep track of the last record per group.
data step1 groups(keep=id group date binday rename=(binday=groupday date=groupdate));
set have;
by id date ;
if first.id then group=1;
output step1;
if binday='Yes' or last.id then do;
output groups;
group+1;
end;
run;
Then re-merge then so you can count how many of the groups you want to consider a BIN. And when that last binday value that was saved in the GROUPDAY variable is Yes then use the current BIN number as the name for you BIN.
data step2;
merge step1 groups;
by id group;
if first.id then binnumber=0;
binnumber + (first.group and groupday='Yes');
if groupday='Yes' then bin=binnumber;
run;
Results:
proc print;
by id;
run;
Hi Tom,
This is fantastic, thank you so, so much! If you knew the many rabbit holes in which I took a deep dive...
So I'm hammering all kinds of different approaches together because I'm not skilled enough and bc I thought that's how people do it (maybe too much of 'package' think). I'm happy to learn from your code!
Also, thank you to @PaigeMiller and @ballardw for the cool data to data step tool! I've tried it but it gave me some trouble, which would have taken too much time to solve when the datalines step worked. Definitely worth trouble-shooting in the future.
Thanks again, everyone, for all the help and info!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.