How do i create a new function/rules for prioritizing number/character...

StickyRoll · Posted 01-27-2025 09:51 PM

How do i create a new function/rules for prioritizing number/character?

I have the following dataset.

Data mock;

trx_date='01Jan2024';

amount=100;

c_type='1';

output;

trx_date='01Jan2024';

amount=200;

c_type='2';

output;

trx_date='01Jan2024';

amount=300;

c_type='3';

output;

trx_date='01Jan2024';

amount=200;

c_type='1 or 3';

output;

trx_date='03Feb2024';

amount=1;

c_type='2';

output;

trx_date='03Feb2024';

amount=1.5;

c_type='3';

output;

trx_date='03Feb2024';

amount=2;

c_type='1 or 2 or 3';

output;

trx_date='31Mar2024';

amount=888;

c_type='1';

output;

trx_date='31Mar2024';

amount=1088;

c_type='2';

output;

run;

In the mockup data, there will be data belonging to 2 different date, 1st Jan 2024 and 3rd Feb 2024.

What i wish to achieve is to look at the data by trx_date as one group.

For trx_date = 1st Jan 2024, c_type will have value of '1' and '2' and '3' and '1 or 3'.

For trx_date = 3rd Feb 2024, c_type will have value of '2' and '3' and '1 or 2 or 3'.

For trx_date = 31st Mar 2024, c_type will have value of '1' and '2' only.

Whenever the value of c_type consist of 'X or X', there will be another level of checking for the "ranking" or "priority" for the c_type as follows.
1 > 3 > 2

In one date period, if there is "X or X" whereby '1' is in the "X Or X" condition, it will check if c_type in the same group of date period has any row of record that consist of '1' (1st in the ranking/priority). If yes, the final value will be '1'. If no, it will check if '3' is in the c_type for the same group of date period (assuming 3 is also in the 'X or X' condition). If yes, the final value would be '3'.

However, if the condition is '1 or 2', it will skip checking for c_type in the same period whether it consist of '3' or not as '3' is not in the 'X or X' condition. Afterall, the function that i wish to create should refer to 'X or X' first.

Worth mentioning here is that it is possible for a date period to not consist of c_type with the value of 'X or X'. Therefore, the function must be dynamic enough to work even if there is no value of 'X or X'.

Based on the example and explanation given here, the final output should be like this:

Data mock;

trx_date='01Jan2024';

amount=100;

c_type='1';

output;

trx_date='01Jan2024';

amount=200;

c_type='2';

output;

trx_date='01Jan2024';

amount=300;

c_type='3';

output;

trx_date='01Jan2024';

amount=200;

c_type='1';

output;

trx_date='03Feb2024';

amount=1;

c_type='2';

output;

trx_date='03Feb2024';

amount=1.5;

c_type='3';

output;

trx_date='03Feb2024';

amount=2;

c_type='3';

output;

trx_date='31Mar2024';

amount=888;

c_type='1';

output;

trx_date='31Mar2024';

amount=1088;

c_type='2';

output;

run;

Currently, i can only think of the following steps to achieve this which i think is not efficient and may not be enough to cover the whole logic.

Step 1: Check if each date has 'X or X'. Flag it with 'Y'.

If yes, proceed with Step 2: Set the ranking of

1 -> 1,

3 -> 2,

2 -> 3

Step 3, for data without 'X or X', direct populate without any logic required.

I am stuck at the part after Step 2 as to how do i define the value for the ranking and at the end, how do i assign the value after that?

quickbluefish · Posted 01-27-2025 10:18 PM

I would start by making your input data easier to follow by writing it something like this:

data mock;
infile cards dsd truncover firstobs=1 dlm=',';
length trx_date 4 amount 8 c_type $20;
informat trx_date date9.;
format trx_date date9.;
input trx_date amount c_type;
cards;
01Jan2024,100,1
01Jan2024,200,2
01Jan2024,300,3
01Jan2024,200,1 or 3
;
run;

Do not use any quotes in the above.

Next, I would dispense with the idea of trying to keep whatever "c_type" is as a character string that might be "X" or "X or X" or "X or X or X". That's just going to be a giant pain. Create 3 variables (e.g., X, Y, Z). If c_type only contains 1 number, only X has a value and Y and Z are missing. Your life will be so much easier and I suspect the solution will become very simple and obvious.

Tom · Posted 01-27-2025 11:24 PM

I am not sure I understand the rule (or how it could be extended to other data).

It would help a lot if you provided the data in a clearer format. Instead of the wallpaper code just post a simple data step with in-line data records.

I think that perhaps you mean you have this input HAVE dataset

data have;
  input trx_date :date. amount c_type $ 15-30 ;
  format trx_date date9.;
cards;
01Jan2024 100  1
01Jan2024 200  2
01Jan2024 300  3
01Jan2024 200  1 or 3
03Feb2024 1    2
03Feb2024 1.5  3
03Feb2024 2    1 or 2 or 3
31Mar2024 888  1
31Mar2024 1088 2
;

and desire to create this WANT dataset.

data want;
  input trx_date :date. amount c_type $ 15-30 ;
  format trx_date date9.;
cards;
01Jan2024 100  1
01Jan2024 200  2
01Jan2024 300  3
01Jan2024 200  1
03Feb2024 1    2
03Feb2024 1.5  3
03Feb2024 2    3
31Mar2024 888  1
31Mar2024 1088 2
;

I am not sure what you mean by 1 < 3 < 2??? Is the issue that your real values are words instead of nunbers and you want to use something other than alphabetical order? So perhaps you have FRED, SALLY and JOE instead of 1, 3 and 2?

Perhaps you could make an INFORMAT that converts the strings into distinct integers in the proper order?

proc format ;
invalue order 
   '1' = 1
   '3' = 2
   '2' = 3
;
run;

Or do you mean that the string with OR keyword are setting ordering?

So that 1 or 3 means check if there are any 1's then check if there are any 3s?

But 3 or 1 would check for 3's first?

Which is it?

As to how to deal with those datasets I would suggest processing all of the observations for a single date in one step (using DOW loop) so that when you get to the last observation you could KNOW whether or not there were any 1's or 3's seen already. That should work assuming the OR records appear at the end.

Something like:

data try1 ;
  do until(last.trx_date);
    set have;
    by trx_date ;
    array found[3] ;
    if not index(c_type,'or') then do;
       n_type = input(c_type,32.);
       found[n_type]=1;
    end;
    else do;
      match=0;
      do index=1 to countw(c_type,' ') by 2 until(match);
        n_type=input(scan(c_type,index,' '),32.);
        match=found[n_type];
      end;
    end;
    output;
  end;
run;

StickyRoll · Posted 02-04-2025 01:59 AM

Hey @Tom , your sample do loop is pretty much what i want to achieve except on the part where the ordering is incorrect.

For Rom 5 to 7, the trx_date 03Feb2024, I would expect the "n_type" resolves to '3' as the ordering is as follows:

1

3

2

Hence, if '1' is not found in the same date period, it will check if '3' exist or not. If yes, the n_value is '3'. '2' is the last in the ordering. Are we able to tweak the loop to cater to this?

Perhaps you could make an INFORMAT that converts the strings into distinct integers in the proper order?

proc format ;
invalue order 
   '1' = 1
   '3' = 2
   '2' = 3
;
run;

You nailed it Tom. Initially i thought of this but without the use of proc format...i was thinking from the point of setting

'1' = 'A'
'3' = 'B'

'2' = 'C'

And i sort base on the alphabet to get the ordering right. However, ur loop has all the things that i wanted except for the ordering (1 to 3 to 2) rather than (1 to 2 to 3).

OR keyword here means that it is possible to have a value with the following string
"1 or 2 or 3" <- this is a valid value for the whole string wrapped with double quotation.

Whenever we see "X or Y", we need to check for the same date period if there is other value consisting on either
'1'

'3'

'2'

If in the same date period where it has value of "1 or 2 or 3", it will set the value based on whatever is found for the same period with the following order:
1

3

2

I hope this is clearer. Sorry for the confusion.

Tom · Posted 02-04-2025 09:03 AM

So the order of the digits in the strings with OR does not matter? "1 or 3" is the same thing as "3 or 1"?

If so you might just build the flag variables and then use a DO loop to check for the first # that is both flagged and in the OR string. So if you have FOUND1 to FOUND3 to indicate if there was every a value of 1, 2 or 3 your loop might look like:

do order=1,3,2 ;
  if found[order] and indexw(c_type,cats(order)) then do;
     n_type=order;
     leave;
  end;
end;

But for just 3 possible flags the loop is probably overkill.

if found1 and indexw(c_type,'1') then n_type=1;
else if found3 and indexw(c_type,'3') then n_type=3;
else if found2 and indexw(c_type,'2') then n_type=2;
else n_type=.;

Ksharp · Posted 01-28-2025 02:44 AM

Data mock;
length c_type $20;

trx_date='01Jan2024';
amount=100;
c_type='1';
output;
 
trx_date='01Jan2024';
amount=200;
c_type='2';
output;
 
trx_date='01Jan2024';
amount=300;
c_type='3';
output;
 
trx_date='01Jan2024';
amount=200;
c_type='1 or 3';
output;
 
trx_date='03Feb2024';
amount=1;
c_type='2';
output;
 
trx_date='03Feb2024';
amount=1.5;
c_type='3';
output;
 
trx_date='03Feb2024';
amount=2;
c_type='1 or 2 or 3';
output;
 
trx_date='31Mar2024';
amount=888;
c_type='1';
output;
 
trx_date='31Mar2024';
amount=1088;
c_type='2';
output;
 
run;

data want;
array _c{9999} $ 40 _temporary_;
call missing(of _c{*});
do i=1 by 1 until(last.trx_date);
 set mock;
 by trx_date;
 _c{i}=c_type;
end;
do until(last.trx_date);
 set mock;
 by trx_date;
 if find(c_type,'or') then do;
     if findw(c_type,'1',,'kd') and '1' in _c then c_type='1';
      else if findw(c_type,'3',,'kd') and '3' in _c then c_type='3';
       else if findw(c_type,'2',,'kd') and '2' in _c then c_type='2';
 end;
 output;
end;
drop i;
run;

SASJedi · Posted 01-29-2025 11:26 AM

Here's my take:

data Have;
   length trx_date amount 8 c_type $15;
	format trx_date date9.;
	trx_date='01Jan2024'd; amount=100; c_type='1'; output;
	trx_date='01Jan2024'd; amount=200; c_type='2'; output;
	trx_date='01Jan2024'd; amount=300; c_type='3'; output;
	trx_date='01Jan2024'd; amount=200; c_type='1 or 3'; output;
	trx_date='03Feb2024'd; amount=1; c_type='2'; output;
	trx_date='03Feb2024'd; amount=1.5; c_type='3'; output;
	trx_date='03Feb2024'd; amount=2; c_type='1 or 2 or 3'; output;
	trx_date='31Mar2024'd; amount=888; c_type='1'; output;
	trx_date='31Mar2024'd; amount=1088; c_type='2'; output;
run;

/* Find the max number of entries for any date, write to macro variable maxCount for sizing our arrays */
proc sql noprint;
select max(count) into :maxCount
   from (select count(*) as Count from have group by trx_date)
;
quit;


data final;
   /* All temp variable names start with _ - drop them all */
	drop _: ;
	length _cs $15;
	/* Set up arrays to hold all amount and c_type values for a date group */
	array a [&maxCount] _temporary_;
	array c [&maxCount] $15 _temporary_;
	/* read in the data you have by trx_date (must be sorted in that order) */
	set have;
	by trx_date;
	/* Reset retained values if this is the first in a new date group */
	if first.trx_date then do;
		call missing (of a[*], of c[*], _count);
	end;
	/* Keep track of which record we're working with */
	_count+1;
	/* Stash the amout and c_type values in the arrays */
	a[_count]=amount;
	c[_count]=c_type;
	/* When I've got all records for this date group, process the decisioning */
	if last.trx_date then do;
	   /* Work through each entry in the arrays */
		do _i=1 to dim(c);
		   /* If this c_type includes 'or' then do the priority checks */
			if find(c[_i],'or') then do;
			   /* concatenate all previous c_type values into a list name _cs */
				do _j=1 to _i-1;
					_cs=cats(_cs,c[_j]);
				end;
				/* If there is a 1 on the list, set c_type to 1 */
				if find(c[_i],'1') and find(_cs,'1') then c_type=1;
				/* otherwise, it there is a 3 on the list, set c_type to 3 */
				else if find(_cs,'3') then c_type=3;
				/* The only other possibility is 2... */
				else c_type=2;
			end;
			/* If there is no 'or' in c_type, get the appropriate value for this record from the array */
			else c_type=c[_i];
			/* Set amount to the appropriate value for this record from the array */
			amount=a[_i];
			/* Sometimes, there were fewer then &maxCount records. If this array value was missing, don't output */
			if not missing(amount) then output;
      end; 
	end;
run;
title "Have";
proc print data=have; run;
title "Final";
proc print data=final; run;

Have

Obs	trx_date	amount	c_type
1	01JAN2024	100.0	1
2	01JAN2024	200.0	2
3	01JAN2024	300.0	3
4	01JAN2024	200.0	1 or 3
5	03FEB2024	1.0	2
6	03FEB2024	1.5	3
7	03FEB2024	2.0	1 or 2 or 3
8	31MAR2024	888.0	1
9	31MAR2024	1088.0	2

Final

Obs	trx_date	amount	c_type
1	01JAN2024	100.0	1
2	01JAN2024	200.0	2
3	01JAN2024	300.0	3
4	01JAN2024	200.0	1
5	03FEB2024	1.0	2
6	03FEB2024	1.5	3
7	03FEB2024	2.0	3
8	31MAR2024	888.0	1
9	31MAR2024	1088.0	2

Check out my Jedi SAS Tricks for SAS Users

StickyRoll · Posted 02-06-2025 02:38 AM

Hello @SASJedi @Tom @Ksharp

Thanks for your excellent codes for reference. For the time being, i have a slight changes to the existing requirements whereby the c_type "1 or 2 or 3" no longer just look at that specific day anymore. Instead, it will look at that specific day AND all day before that day.

Simply put, whenever there is value "1 or 2 or 3" in c_type appears on 03Feb2024, i will have to scan through all days on and before 03Feb2024.

Have

Obs trx_date amount c_type123456789

01JAN2024	100.0	1
01JAN2024	200.0	2
01JAN2024	300.0	3
01JAN2024	200.0	1 or 2
03FEB2024	1.0	1
03FEB2024	1.5	3
03FEB2024	2.0	1 or 2 or 3
31MAR2024	888.0	1
31MAR2024	1088.0	2

Final

Obs trx_date amount c_type123456789

01JAN2024	100.0	1
01JAN2024	200.0	2
01JAN2024	300.0	3
01JAN2024	200.0	1
03FEB2024	1.0	2
03FEB2024	1.5	3
03FEB2024	2.0	1
31MAR2024	888.0	1
31MAR2024	1088.0	2

Can we slightly tune the code to cater to this?

Ksharp · Posted 02-06-2025 03:35 AM

OK.

Assuming "1 or 2 or 3" is always appeared at the end of a trx_date.

Otherwise, you need a RPOC SORT to order the data firstly.

Data mock;
length c_type $20;

trx_date='01Jan2024';
amount=100;
c_type='1';
output;
 
trx_date='01Jan2024';
amount=200;
c_type='2';
output;
 
trx_date='01Jan2024';
amount=300;
c_type='3';
output;
 
trx_date='01Jan2024';
amount=200;
c_type='1 or 3';
output;
 
trx_date='03Feb2024';
amount=1;
c_type='2';
output;
 
trx_date='03Feb2024';
amount=1.5;
c_type='3';
output;
 
trx_date='03Feb2024';
amount=2;
c_type='1 or 2 or 3';
output;
 
trx_date='31Mar2024';
amount=888;
c_type='1';
output;
 
trx_date='31Mar2024';
amount=1088;
c_type='2';
output;
 
run;




data want;
array _c{999999} $ 40 _temporary_;
 set mock;
 _c{_n_}=c_type;
 if find(c_type,'or') then do;
     if findw(c_type,'1',,'kd') and '1' in _c then c_type='1';
      else if findw(c_type,'3',,'kd') and '3' in _c then c_type='3';
       else if findw(c_type,'2',,'kd') and '2' in _c then c_type='2';
 end;
run;

Tom · Posted 02-06-2025 03:18 PM

So just remove the BY group processing.

data have;
  input trx_date :date. amount c_type $ 15-30 ;
  format trx_date date9.;
cards;
01Jan2024 100  1
01Jan2024 200  2
01Jan2024 300  3
01Jan2024 200  1 or 3
03Feb2024 1    2
03Feb2024 1.5  3
03Feb2024 2    1 or 2 or 3
31Mar2024 888  1
31Mar2024 1088 2
;

You can just treat any value that cannot be converted to a number as one to trigger looking back for the "highest" previous value. You could use my previous array method.

data want ;
  set have;
  array found found1-found3 ;
  retain found1-found3;
  n_type = input(c_type,??32.);
  if n_type in (1:3) then found[n_type]=1;
  else do n_type=1,3,2;
    if found[n_type] then leave;
  end;
  if n_type > 3 then n_type=.;
run;

Or just keep a single "highest" value variable to use.

data want2 ;
  set have;
  retain highest ;
  array order [3] _temporary_ (2 3 1);
  n_type = input(c_type,??32.);
  if n_type=1 then highest=1;
  else if n_type=3 and (highest ^= 1) then highest=3;
  else if n_type=2 and (highest=.) then highest=2;
  else n_type=coalesce(n_type,highest);
run;

Your example is probably not the best for testing. Since the first real value is 1 that means any missing values are going to be replaced by 1. To see if the strange ordering works you need different test data.

How do i create a new function/rules for prioritizing number/character?

Re: How do i create a new function/rules for prioritizing number/character?

Re: How do i create a new function/rules for prioritizing number/character?

Re: How do i create a new function/rules for prioritizing number/character?

Re: How do i create a new function/rules for prioritizing number/character?

Re: How do i create a new function/rules for prioritizing number/character?

Re: How do i create a new function/rules for prioritizing number/character?

Re: How do i create a new function/rules for prioritizing number/character?

Re: How do i create a new function/rules for prioritizing number/character?

Re: How do i create a new function/rules for prioritizing number/character?

Registration is open

Registration is open

SAS Training: Just a Click Away