BookmarkSubscribeRSS Feed
andimo89
Obsidian | Level 7

Hello SAS programming community, 

 

I have a dataset that I have successfully used the catx function to create a new column called Active Holds. The issue now is a bit more complex for me. Most examples of the web for catx or any cat function illustrate how two column values can be combined into a new column. How do you combined two values from the same column? I created a fake dataset below to help illustrate my question. I would like to take values from the Active Holds column and combine Flex Advisement and Flex Orientation. Using the function catx(delimeter, column, column) does not give the results I need. How is this possible? Any help or guidance would be appreciated. Thank you in advance!

 

data sample_data;
input STUID 'Active Holds'n& $21. 'Active Holds Date'n;
format 'Active Hold Date'n mmddyy10.;
datalines;
9007741 Flex Advisement 01/20/2022
9007741  Flex Orientation 01/20/2022
9008643 Cashier hold 01/20/2022
9011126 Admissions 01/07/2022
9013751 Cashier hold 12/19/2021

9024453 Admissions2 12/18/2021
;

run; 

11 REPLIES 11
Reeza
Super User

Show the expected output. 

 

Is it only needed when STUID & date is the same but the active hold is different? Just this record? How does it generalize?

 

Here's an example of how you can combine multiple rows into one:

https://gist.github.com/statgeek/d583cfa992bf56da51d435165b07e96a

 


@andimo89 wrote:

Hello SAS programming community, 

 

I have a dataset that I have successfully used the catx function to create a new column called Active Holds. The issue now is a bit more complex for me. Most examples of the web for catx or any cat function illustrate how two column values can be combined into a new column. How do you combined two values from the same column? I created a fake dataset below to help illustrate my question. I would like to take values from the Active Holds column and combine Flex Advisement and Flex Orientation. Using the function catx(delimeter, column, column) does not give the results I need. How is this possible? Any help or guidance would be appreciated. Thank you in advance!

 

data sample_data;
input STUID 'Active Holds'n& $21. 'Active Holds Date'n;
format 'Active Hold Date'n mmddyy10.;
datalines;
9007741 Flex Advisement 01/20/2022
9007741  Flex Orientation 01/20/2022
9008643 Cashier hold 01/20/2022
9011126 Admissions 01/07/2022
9013751 Cashier hold 12/19/2021

9024453 Admissions2 12/18/2021
;

run; 


 

andimo89
Obsidian | Level 7

Hi Reeza, 

 

Thank you for your speedy response. So it is needed or grouped by STUID, date and whenever these two rows show up Flex Advisement and Flex Orientation

The outcome needed should look like this: 

9007741  Flex Orientation: Flex Advisement 01/20/2022. 

 

 

 

Reeza
Super User
Could you have other situations where you have duplicate STUDID and Dates and different holds but you don't want those combined? If so, please modify your sample data to account for that please. It does make your problem more complicated for sure though.
andimo89
Obsidian | Level 7

Yes, this may happen. A person may have another hold that is not called Flex Advisement and Flex Orientation. 

For example: 

9007741  Flex Orientation: Flex Advisement 01/20/2022

9007741 Admissions 01/22/2022

9007741 Cashier 01/24/2022

 

but they only want to merge when Flex Advisement and Flex Orientation. These two items always show up on the same date per person but they would like only collapse that into one row. 

ballardw
Super User

Can you provide a more generic descriptive rule instead of just listing an example?

 

It almost appears appears as if the desired behavior is: when the same STUID has more than one record with the same date that the 'Active Hold'n values are combined.

 

If this is the case then perhaps:

options validvarname=any;
data sample_data;
infile datalines dlm=',';
input STUID 'Active Holds'n& $21. 'Active Hold Date'n:mmddyy10.;
format 'Active Hold Date'n mmddyy10.;
datalines;
9007741, Flex Advisement, 01/20/2022
9007741, Flex Orientation, 01/20/2022
9008643, Cashier hold, 01/20/2022
9011126, Admissions, 01/07/2022
9013751, Cashier hold, 12/19/2021
9024453, Admissions2, 12/18/2021
;

proc sort data=sample_data;
   by stuid  'Active Hold Date'n  'Active Holds'n;
run;

data want;
   set sample_data;
   by stuid  'Active Hold Date'n  'Active Holds'n;
   length holds $ 300;
   retain holds;
   /* reset the hold variable for the new date*/
   if 'first.Active Hold Date'n  then call missing(holds);
   holds=catx(';',holds,'Active Holds'n );
   if 'last.Active Hold Date'n;
   drop 'Active Holds'n;
run;

Note that your input is incorrect, no format to read the date, and the Format statement has a different variable.

The sort is partially because the BY statement works much better with sorted data and partially because this way the order of the combined Holds variable will be consistent in order.

The BY allows use of SAS automatic variables First. and Last. to tell if a group starts or ends. Which allows things like resetting or initializing variables or deciding when you are done with a group. Note that when there is a single record in a group it is both the first and last.

The RETAIN keeps a value across iterations of the step. You need an additional variable to do what you request.

The Call Missing sets the value of the variable to missing.

The IF Last is a subsetting IF and records are only written to the output data set when true.

 

 

andimo89
Obsidian | Level 7

Hello, 

 

Thank you for the response. I appreciate you telling about the date format is incorrect. I didn't realize that. 

As mentioned in an earlier comment, combination of row information are only for these values: Flex advisement and Flex orientation. When these two pieces of information are available by STUID then concatenate this into one row. As in, Flex Advisement: Flex Orientation with a colon delimiter. I am unsure if I could flag these items first and them concatenate for these only and leave the rest of the values as is. The code you gave is a good starting point and great example to explore. 

 

andimo89
Obsidian | Level 7

Thank you for this link Reeza. It is greatly appreciated. 

Reeza
Super User

No time to code this but the simplest solution here may be to sort the data and then iterate through with a data step. Use the LAG() function to check the previous value and if you encounter Flex Orientation and the previous value was Advisement then output that record to a different data set. Then merge that data back in.

 

Here's one way - I suspect this won't work on your real data. If that's the case, please update your sample data with examples that match the cases where it doesn't work.

 

data sample_data;
infile datalines dlm=',';
input STUID ActiveHolds & $21. ActiveHoldDate:mmddyy10.;
format ActiveHoldDate mmddyy10.;
datalines;
9007741, Flex Advisement, 01/20/2022
9007741, Flex Orientation, 01/20/2022
9008643, Cashier hold, 01/20/2022
9011126, Admissions, 01/07/2022
9013751, Cashier hold, 12/19/2021
9024453, Admissions2, 12/18/2021
;

proc sort data=sample_data;
by stuid ActiveHoldDate ActiveHolds;
run;


data most flex_orientation (rename=ActiveHolds = ActiveHolds2);;
set sample_data;
by stuid ActiveHoldDate ActiveHolds;

prev_hold = lag(ActiveHolds);
if first.stuid then call missing(prev_hold);

if prev_hold = 'Flex Advisement' & ActiveHolds='Flex Orientation' then output flex_orientation;
else output most;

run;

data want;
merge most flex_orientation;
by stuid activeHoldDate;
ActiveHolds = catx("::", ActiveHolds, ActiveHolds2);
run;
andimo89
Obsidian | Level 7

Thank you, Reeza. I will try this. 

s_lassen
Meteorite | Level 14

Something like this may do the trick:

  1. Create the data with concatenated values
  2. Merge them back with the rest of the data
data concat;                                                   
  merge                                                        
    sample_data(where=(ActiveHolds='Flex Orientation') in=in1) 
    sample_data(where=(ActiveHolds='Flex Advisement') in=in2)  
    ;                                                          
  by STUID ActiveHoldDate;                                     
  if in1 and in2 then                                              
    new_value='Flex Orientation: Flex Advisement';               
  else
    new_value=ActiveHolds;
  drop ActiveHolds;                                            
  rename new_value=ActiveHolds;                                
run;                      
                                     
data want;                                                     
  set concat                                                   
  sample_data(where=(ActiveHolds not in(                       
            'Flex Orientation','Flex Advisement')));           
  by STUID ActiveHoldDate;                                     
run;

Note that the length of the original variable, ActiveHolds, is not long enough to hold the new value. Which is why it is put into a new variable, new_value, which is then renamed. This is also the reason that CONCAT comes before the original dataset in the final SET statement, so that the length of ActiveHolds is the one from the CONCAT table.

andimo89
Obsidian | Level 7
Thank you

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2798 views
  • 3 likes
  • 4 in conversation