BookmarkSubscribeRSS Feed
jmmedina252
Calcite | Level 5

Hello, I need to remove characters from a string based on multiple conditions. 

The data I have looks like this:

Animal     Form        Days                     Days_Expected    Days_Missing

1001        exam        1,2,3,36,103        0,1,2,4,36,103          0,4 

1001        lab            1,2,3,36,100        0,1,2,4,30,100          0,4,30 

 

So I need a function that will remove characters from the missing value based on criteria in the days value.

For instance, If form=exam and days contains '3', then remove '4' from days_missing and if form=exam and days contains '1', then remove '0' from days missing.  Also, if form=lab and days contains '1' then remove '0' from days_missing

 

The outcome wanted would be this:

 

Animal     Form        Days                     Days_Expected    Days_Missing

1001        exam        1,2,3,36,103        0,1,2,4,36,103           

1001        lab            1,2,3,36,100        0,1,2,4,30,100          4,30 

 

12 REPLIES 12
ballardw
Super User

Is that ALL of the rules? Are there other values for "Form" that are going to have conditions?

 

I think that you want to carefully restate your conditions and what is removed. If you remove '0' then '103' becomes '13' if it made it into the "days missing". Similarly the '30' would become '3'.

 

And show the code you are actually using to identify the "days-missing" as it may be better to apply the filters before adding the value to 'days_missing' to begin with. As apparently they are not missing and not supposed to be missing.

jmmedina252
Calcite | Level 5
below is the code to create the list of missing values. the problem is that the list of days expected are not inclusive of when the day is -1(+1) or 28 (+2) ect. There are also about 10 different forms with different criteria on days expected.

*check between list variables;
data missing_days;
set check_formdays;
length Days_missing $1000;
do index=1 to countw(days_expected,',');
if not findw(day_intended,scan(days_expected,index,','),',','T') then
days_missing=catx(',',days_missing,scan(days_expected,index,','))
;
end;
drop index;
run;
tarheel13
Rhodochrosite | Level 12
data have;
   length animal $4 form $4 days $40 days_expected $40 days_missing $40;
   animal='1001';
   form='exam';
   days='1,2,3,36,103';
   days_expected='0,1,2,4,36,103';
   days_missing='0,4';
   output;
   animal='1001';
   form='lab';
   days='1,2,3,36,100';
   days_expected='0,1,2,4,30,100';
   days_missing='0,4,30';
   output;
proc print;
run;

 

mkeintz
PROC Star

@tarheel13 

 

Yes, the transtrn as coded will remove '4,', but it did not remove the trailing '4' (as in '0,4') in the first observation.   And it would also change '0,14,'16' to '0,116'.

 

Edited note:

Earlier, there was a suggestion, now deleted, of using the TRANSTRN function, which prompted my response above.  Here is a way that TRANSTRN can be used without the problems I mentioned.  It requires prepending and appending a ',' to days_missing, then run TRANSTRN against individual values, also preceded and succeeded by commas.  Finally remove the extra commas:

 

data want;
  set have;
  length new_days_missing $42;
  new_days_missing=cats(',',days_missing,',');

  if form='exam' then do;
    if findw(days,'3',',') then new_days_missing=transtrn(new_days_missing,',4,',',');
    if findw(days,'1',',') then new_days_missing=transtrn(new_days_missing,',0,',',');
  end;
  else if form='lab' then do;
    if findw(days,'1',',') then new_days_missing=transtrn(new_days_missing,',0,',',');
  end;
  if new_days_missing=',' then new_days_missing=' ';
  else new_days_missing=substr(new_days_missing,2,length(new_days_missing)-2);
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
tarheel13
Rhodochrosite | Level 12

do you have a better solution?

ballardw
Super User

@tarheel13 wrote:

do you have a better solution?


Since we now have separate step from the original question of "determine days missing" I would say to place the values of days missing into an array, possibly temporary. Then we have NUMERIC values and can match them to (input(scan(days,i)) type code.

Which I would suggest doing at creating days missing instead of continuing with a poor data approach to begin with.

Perhaps after all the rules (any bets we get at least one more set related to this?) have been applied it might be worth sticking multiple values into a single one.

Really reading all of days, expected days into arrays likely makes this a BUNCH simpler in the long run than attempting to deal with character strings.

Quentin
Super User

I saw your other post about how to generate days_missing.  I think you are making life VERY hard for yourself by designing a data structure with these comma separated lists stored in character variables.  Typically things are easier if you work with a vertical structure.  I don't quite understand your data, but it looks like maybe you are keeping track of forms that are due (expected), and whether or not they have been received.

 

Instead of structuring like:

Animal     Form        Days                     Days_Expected    Days_Missing
1001        exam        1,2,3,36,103        0,1,2,4,36,103          0,4 
1001        lab         1,2,3,36,100        0,1,2,4,30,100          0,4,30 

You could structure to have one record per form that is expected, with a boolean variable to indicate whether or not the form was received e.g.:

Animal  Form  Day  Received
1001    exam  0    0
1001    exam  1    1
1001    exam  2    1
1001    exam  4    0
1001    exam  36   1
1001    exam  103  1

1001    lab   0    0
1001    lab   1    1
1001    lab   2    1
1001    lab   4    0
1001    lab   30   0
1001    lab   100  1


With a structure like that, it's much easier to work with.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
A_Kh
Lapis Lazuli | Level 10
data want;
	set have;
	if lowcase(strip(form)) eq 'exam' then do;
		num= countc(days, ',');
		do i=1 to num+1;
			temp= input(scan(days, i, ','), best.);
			if temp eq 3 then do;
				count= countc(days_missing, ',');
				do k=1 to count+1;
					if input(scan(days_missing, k, ','), best.) eq 0 then days_missing= tranwrd(days_missing,scan(days_missing, k, ','), '');
				end; 
			end; 
			else if temp eq 1 then do;
				count= countc(days_missing, ',');
				do k=1 to count+1;
					if input(scan(days_missing, k, ','), best.) eq 4 then days_missing= tranwrd(days_missing,scan(days_missing, k, ','), '');
				end; 
			end; 
		end;
	end; 
	else if lowcase(strip(form)) eq 'lab' then do;
		num= countc(days, ',');
		do i=1 to num+1;
			temp= input(scan(days, i, ','), best.);
			if temp eq 1 then do;
				count= countc(days_missing, ',');
				do k=1 to count+1;
					if input(scan(days_missing, k, ','), best.) eq 0 then days_missing= tranwrd(days_missing,scan(days_missing, k, ','), '');
				end; 
			end; 
		end;
	end; 
	if substr(strip(days_missing), 1, 1) eq ',' then days_missing=substrn(strip(days_missing), 2);
	drop num temp count i k;
run;
tarheel13
Rhodochrosite | Level 12

for the second row, it gets days_missing='4,3' rather than '4,30'

Oligolas
Barite | Level 11

Hi,

Use a regular expression:

data want;
   set have;
   length days_missing2 $30;
   days_missing2=days_missing;
   if form in ('exam' 'lab') then do;
      days_missing2=prxchange('s/\D0(?!\d)//',-1,days_missing2);
      days_missing2=prxchange('s/(?<!\d)0\D//',-1,days_missing2);
   end;
   if form='exam' then do;
      days_missing2=prxchange('s/\D4(?!\d)//',-1,days_missing2);
      days_missing2=prxchange('s/(?<!\d)4\D//',-1,days_missing2);
   end;
run;  

 

________________________

- Cheers -

tarheel13
Rhodochrosite | Level 12

I find regular expressions are still underused by most SAS programmers. Great solution!

Oligolas
Barite | Level 11

Edit: the check on days must be added

data want;
   set have;
   length days_missing2 $30;
   days_missing2=days_missing;
   if form in ('exam' 'lab') and prxmatch('/(?<!\d)1(?!\d)/',days) then do;
      days_missing2=prxchange('s/\D0(?!\d)//',-1,days_missing2);
      days_missing2=prxchange('s/(?<!\d)0\D//',-1,days_missing2);
   end;
   if form='exam' and prxmatch('/(?<!\d)3(?!\d)/',days)then do;
      days_missing2=prxchange('s/\D4(?!\d)//',-1,days_missing2);
      days_missing2=prxchange('s/(?<!\d)4\D//',-1,days_missing2);
   end;
run;  

 

________________________

- Cheers -

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1003 views
  • 3 likes
  • 7 in conversation