- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello SAS programming community,
I have a dataset that I have successfully used the catx function to create a new column called Active Holds. The issue now is a bit more complex for me. Most examples of the web for catx or any cat function illustrate how two column values can be combined into a new column. How do you combined two values from the same column? I created a fake dataset below to help illustrate my question. I would like to take values from the Active Holds column and combine Flex Advisement and Flex Orientation. Using the function catx(delimeter, column, column) does not give the results I need. How is this possible? Any help or guidance would be appreciated. Thank you in advance!
data sample_data;
input STUID 'Active Holds'n& $21. 'Active Holds Date'n;
format 'Active Hold Date'n mmddyy10.;
datalines;
9007741 Flex Advisement 01/20/2022
9007741 Flex Orientation 01/20/2022
9008643 Cashier hold 01/20/2022
9011126 Admissions 01/07/2022
9013751 Cashier hold 12/19/2021
9024453 Admissions2 12/18/2021
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Show the expected output.
Is it only needed when STUID & date is the same but the active hold is different? Just this record? How does it generalize?
Here's an example of how you can combine multiple rows into one:
https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a
@andimo89 wrote:
Hello SAS programming community,
I have a dataset that I have successfully used the catx function to create a new column called Active Holds. The issue now is a bit more complex for me. Most examples of the web for catx or any cat function illustrate how two column values can be combined into a new column. How do you combined two values from the same column? I created a fake dataset below to help illustrate my question. I would like to take values from the Active Holds column and combine Flex Advisement and Flex Orientation. Using the function catx(delimeter, column, column) does not give the results I need. How is this possible? Any help or guidance would be appreciated. Thank you in advance!
data sample_data;
input STUID 'Active Holds'n& $21. 'Active Holds Date'n;
format 'Active Hold Date'n mmddyy10.;
datalines;
9007741 Flex Advisement 01/20/2022
9007741 Flex Orientation 01/20/2022
9008643 Cashier hold 01/20/2022
9011126 Admissions 01/07/2022
9013751 Cashier hold 12/19/20219024453 Admissions2 12/18/2021
;run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reeza,
Thank you for your speedy response. So it is needed or grouped by STUID, date and whenever these two rows show up Flex Advisement and Flex Orientation.
The outcome needed should look like this:
9007741 Flex Orientation: Flex Advisement 01/20/2022.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, this may happen. A person may have another hold that is not called Flex Advisement and Flex Orientation.
For example:
9007741 Flex Orientation: Flex Advisement 01/20/2022
9007741 Admissions 01/22/2022
9007741 Cashier 01/24/2022
but they only want to merge when Flex Advisement and Flex Orientation. These two items always show up on the same date per person but they would like only collapse that into one row.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you provide a more generic descriptive rule instead of just listing an example?
It almost appears appears as if the desired behavior is: when the same STUID has more than one record with the same date that the 'Active Hold'n values are combined.
If this is the case then perhaps:
options validvarname=any; data sample_data; infile datalines dlm=','; input STUID 'Active Holds'n& $21. 'Active Hold Date'n:mmddyy10.; format 'Active Hold Date'n mmddyy10.; datalines; 9007741, Flex Advisement, 01/20/2022 9007741, Flex Orientation, 01/20/2022 9008643, Cashier hold, 01/20/2022 9011126, Admissions, 01/07/2022 9013751, Cashier hold, 12/19/2021 9024453, Admissions2, 12/18/2021 ; proc sort data=sample_data; by stuid 'Active Hold Date'n 'Active Holds'n; run; data want; set sample_data; by stuid 'Active Hold Date'n 'Active Holds'n; length holds $ 300; retain holds; /* reset the hold variable for the new date*/ if 'first.Active Hold Date'n then call missing(holds); holds=catx(';',holds,'Active Holds'n ); if 'last.Active Hold Date'n; drop 'Active Holds'n; run;
Note that your input is incorrect, no format to read the date, and the Format statement has a different variable.
The sort is partially because the BY statement works much better with sorted data and partially because this way the order of the combined Holds variable will be consistent in order.
The BY allows use of SAS automatic variables First. and Last. to tell if a group starts or ends. Which allows things like resetting or initializing variables or deciding when you are done with a group. Note that when there is a single record in a group it is both the first and last.
The RETAIN keeps a value across iterations of the step. You need an additional variable to do what you request.
The Call Missing sets the value of the variable to missing.
The IF Last is a subsetting IF and records are only written to the output data set when true.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thank you for the response. I appreciate you telling about the date format is incorrect. I didn't realize that.
As mentioned in an earlier comment, combination of row information are only for these values: Flex advisement and Flex orientation. When these two pieces of information are available by STUID then concatenate this into one row. As in, Flex Advisement: Flex Orientation with a colon delimiter. I am unsure if I could flag these items first and them concatenate for these only and leave the rest of the values as is. The code you gave is a good starting point and great example to explore.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for this link Reeza. It is greatly appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No time to code this but the simplest solution here may be to sort the data and then iterate through with a data step. Use the LAG() function to check the previous value and if you encounter Flex Orientation and the previous value was Advisement then output that record to a different data set. Then merge that data back in.
Here's one way - I suspect this won't work on your real data. If that's the case, please update your sample data with examples that match the cases where it doesn't work.
data sample_data;
infile datalines dlm=',';
input STUID ActiveHolds & $21. ActiveHoldDate:mmddyy10.;
format ActiveHoldDate mmddyy10.;
datalines;
9007741, Flex Advisement, 01/20/2022
9007741, Flex Orientation, 01/20/2022
9008643, Cashier hold, 01/20/2022
9011126, Admissions, 01/07/2022
9013751, Cashier hold, 12/19/2021
9024453, Admissions2, 12/18/2021
;
proc sort data=sample_data;
by stuid ActiveHoldDate ActiveHolds;
run;
data most flex_orientation (rename=ActiveHolds = ActiveHolds2);;
set sample_data;
by stuid ActiveHoldDate ActiveHolds;
prev_hold = lag(ActiveHolds);
if first.stuid then call missing(prev_hold);
if prev_hold = 'Flex Advisement' & ActiveHolds='Flex Orientation' then output flex_orientation;
else output most;
run;
data want;
merge most flex_orientation;
by stuid activeHoldDate;
ActiveHolds = catx("::", ActiveHolds, ActiveHolds2);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Reeza. I will try this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Something like this may do the trick:
- Create the data with concatenated values
- Merge them back with the rest of the data
data concat;
merge
sample_data(where=(ActiveHolds='Flex Orientation') in=in1)
sample_data(where=(ActiveHolds='Flex Advisement') in=in2)
;
by STUID ActiveHoldDate;
if in1 and in2 then
new_value='Flex Orientation: Flex Advisement';
else
new_value=ActiveHolds;
drop ActiveHolds;
rename new_value=ActiveHolds;
run;
data want;
set concat
sample_data(where=(ActiveHolds not in(
'Flex Orientation','Flex Advisement')));
by STUID ActiveHoldDate;
run;
Note that the length of the original variable, ActiveHolds, is not long enough to hold the new value. Which is why it is put into a new variable, new_value, which is then renamed. This is also the reason that CONCAT comes before the original dataset in the final SET statement, so that the length of ActiveHolds is the one from the CONCAT table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content