BookmarkSubscribeRSS Feed
jmmedina252
Calcite | Level 5

Hello, I need to remove characters from a string based on multiple conditions. 

The data I have looks like this:

Animal     Form        Days                     Days_Expected    Days_Missing

1001        exam        1,2,3,36,103        0,1,2,4,36,103          0,4 

1001        lab            1,2,3,36,100        0,1,2,4,30,100          0,4,30 

 

So I need a function that will remove characters from the missing value based on criteria in the days value.

For instance, If form=exam and days contains '3', then remove '4' from days_missing and if form=exam and days contains '1', then remove '0' from days missing.  Also, if form=lab and days contains '1' then remove '0' from days_missing

 

The outcome wanted would be this:

 

Animal     Form        Days                     Days_Expected    Days_Missing

1001        exam        1,2,3,36,103        0,1,2,4,36,103           

1001        lab            1,2,3,36,100        0,1,2,4,30,100          4,30 

 

12 REPLIES 12
ballardw
Super User

Is that ALL of the rules? Are there other values for "Form" that are going to have conditions?

 

I think that you want to carefully restate your conditions and what is removed. If you remove '0' then '103' becomes '13' if it made it into the "days missing". Similarly the '30' would become '3'.

 

And show the code you are actually using to identify the "days-missing" as it may be better to apply the filters before adding the value to 'days_missing' to begin with. As apparently they are not missing and not supposed to be missing.

jmmedina252
Calcite | Level 5
below is the code to create the list of missing values. the problem is that the list of days expected are not inclusive of when the day is -1(+1) or 28 (+2) ect. There are also about 10 different forms with different criteria on days expected.

*check between list variables;
data missing_days;
set check_formdays;
length Days_missing $1000;
do index=1 to countw(days_expected,',');
if not findw(day_intended,scan(days_expected,index,','),',','T') then
days_missing=catx(',',days_missing,scan(days_expected,index,','))
;
end;
drop index;
run;
tarheel13
Rhodochrosite | Level 12
data have;
   length animal $4 form $4 days $40 days_expected $40 days_missing $40;
   animal='1001';
   form='exam';
   days='1,2,3,36,103';
   days_expected='0,1,2,4,36,103';
   days_missing='0,4';
   output;
   animal='1001';
   form='lab';
   days='1,2,3,36,100';
   days_expected='0,1,2,4,30,100';
   days_missing='0,4,30';
   output;
proc print;
run;

 

mkeintz
PROC Star

@tarheel13 

 

Yes, the transtrn as coded will remove '4,', but it did not remove the trailing '4' (as in '0,4') in the first observation.   And it would also change '0,14,'16' to '0,116'.

 

Edited note:

Earlier, there was a suggestion, now deleted, of using the TRANSTRN function, which prompted my response above.  Here is a way that TRANSTRN can be used without the problems I mentioned.  It requires prepending and appending a ',' to days_missing, then run TRANSTRN against individual values, also preceded and succeeded by commas.  Finally remove the extra commas:

 

data want;
  set have;
  length new_days_missing $42;
  new_days_missing=cats(',',days_missing,',');

  if form='exam' then do;
    if findw(days,'3',',') then new_days_missing=transtrn(new_days_missing,',4,',',');
    if findw(days,'1',',') then new_days_missing=transtrn(new_days_missing,',0,',',');
  end;
  else if form='lab' then do;
    if findw(days,'1',',') then new_days_missing=transtrn(new_days_missing,',0,',',');
  end;
  if new_days_missing=',' then new_days_missing=' ';
  else new_days_missing=substr(new_days_missing,2,length(new_days_missing)-2);
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
tarheel13
Rhodochrosite | Level 12

do you have a better solution?

ballardw
Super User

@tarheel13 wrote:

do you have a better solution?


Since we now have separate step from the original question of "determine days missing" I would say to place the values of days missing into an array, possibly temporary. Then we have NUMERIC values and can match them to (input(scan(days,i)) type code.

Which I would suggest doing at creating days missing instead of continuing with a poor data approach to begin with.

Perhaps after all the rules (any bets we get at least one more set related to this?) have been applied it might be worth sticking multiple values into a single one.

Really reading all of days, expected days into arrays likely makes this a BUNCH simpler in the long run than attempting to deal with character strings.

Quentin
Super User

I saw your other post about how to generate days_missing.  I think you are making life VERY hard for yourself by designing a data structure with these comma separated lists stored in character variables.  Typically things are easier if you work with a vertical structure.  I don't quite understand your data, but it looks like maybe you are keeping track of forms that are due (expected), and whether or not they have been received.

 

Instead of structuring like:

Animal     Form        Days                     Days_Expected    Days_Missing
1001        exam        1,2,3,36,103        0,1,2,4,36,103          0,4 
1001        lab         1,2,3,36,100        0,1,2,4,30,100          0,4,30 

You could structure to have one record per form that is expected, with a boolean variable to indicate whether or not the form was received e.g.:

Animal  Form  Day  Received
1001    exam  0    0
1001    exam  1    1
1001    exam  2    1
1001    exam  4    0
1001    exam  36   1
1001    exam  103  1

1001    lab   0    0
1001    lab   1    1
1001    lab   2    1
1001    lab   4    0
1001    lab   30   0
1001    lab   100  1


With a structure like that, it's much easier to work with.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
A_Kh
Lapis Lazuli | Level 10
data want;
	set have;
	if lowcase(strip(form)) eq 'exam' then do;
		num= countc(days, ',');
		do i=1 to num+1;
			temp= input(scan(days, i, ','), best.);
			if temp eq 3 then do;
				count= countc(days_missing, ',');
				do k=1 to count+1;
					if input(scan(days_missing, k, ','), best.) eq 0 then days_missing= tranwrd(days_missing,scan(days_missing, k, ','), '');
				end; 
			end; 
			else if temp eq 1 then do;
				count= countc(days_missing, ',');
				do k=1 to count+1;
					if input(scan(days_missing, k, ','), best.) eq 4 then days_missing= tranwrd(days_missing,scan(days_missing, k, ','), '');
				end; 
			end; 
		end;
	end; 
	else if lowcase(strip(form)) eq 'lab' then do;
		num= countc(days, ',');
		do i=1 to num+1;
			temp= input(scan(days, i, ','), best.);
			if temp eq 1 then do;
				count= countc(days_missing, ',');
				do k=1 to count+1;
					if input(scan(days_missing, k, ','), best.) eq 0 then days_missing= tranwrd(days_missing,scan(days_missing, k, ','), '');
				end; 
			end; 
		end;
	end; 
	if substr(strip(days_missing), 1, 1) eq ',' then days_missing=substrn(strip(days_missing), 2);
	drop num temp count i k;
run;
tarheel13
Rhodochrosite | Level 12

for the second row, it gets days_missing='4,3' rather than '4,30'

Oligolas
Barite | Level 11

Hi,

Use a regular expression:

data want;
   set have;
   length days_missing2 $30;
   days_missing2=days_missing;
   if form in ('exam' 'lab') then do;
      days_missing2=prxchange('s/\D0(?!\d)//',-1,days_missing2);
      days_missing2=prxchange('s/(?<!\d)0\D//',-1,days_missing2);
   end;
   if form='exam' then do;
      days_missing2=prxchange('s/\D4(?!\d)//',-1,days_missing2);
      days_missing2=prxchange('s/(?<!\d)4\D//',-1,days_missing2);
   end;
run;  

 

________________________

- Cheers -

tarheel13
Rhodochrosite | Level 12

I find regular expressions are still underused by most SAS programmers. Great solution!

Oligolas
Barite | Level 11

Edit: the check on days must be added

data want;
   set have;
   length days_missing2 $30;
   days_missing2=days_missing;
   if form in ('exam' 'lab') and prxmatch('/(?<!\d)1(?!\d)/',days) then do;
      days_missing2=prxchange('s/\D0(?!\d)//',-1,days_missing2);
      days_missing2=prxchange('s/(?<!\d)0\D//',-1,days_missing2);
   end;
   if form='exam' and prxmatch('/(?<!\d)3(?!\d)/',days)then do;
      days_missing2=prxchange('s/\D4(?!\d)//',-1,days_missing2);
      days_missing2=prxchange('s/(?<!\d)4\D//',-1,days_missing2);
   end;
run;  

 

________________________

- Cheers -

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 1653 views
  • 3 likes
  • 7 in conversation