Hello, I need to remove characters from a string based on multiple conditions.
The data I have looks like this:
Animal Form Days Days_Expected Days_Missing
1001 exam 1,2,3,36,103 0,1,2,4,36,103 0,4
1001 lab 1,2,3,36,100 0,1,2,4,30,100 0,4,30
So I need a function that will remove characters from the missing value based on criteria in the days value.
For instance, If form=exam and days contains '3', then remove '4' from days_missing and if form=exam and days contains '1', then remove '0' from days missing. Also, if form=lab and days contains '1' then remove '0' from days_missing
The outcome wanted would be this:
Animal Form Days Days_Expected Days_Missing
1001 exam 1,2,3,36,103 0,1,2,4,36,103
1001 lab 1,2,3,36,100 0,1,2,4,30,100 4,30
Is that ALL of the rules? Are there other values for "Form" that are going to have conditions?
I think that you want to carefully restate your conditions and what is removed. If you remove '0' then '103' becomes '13' if it made it into the "days missing". Similarly the '30' would become '3'.
And show the code you are actually using to identify the "days-missing" as it may be better to apply the filters before adding the value to 'days_missing' to begin with. As apparently they are not missing and not supposed to be missing.
data have;
length animal $4 form $4 days $40 days_expected $40 days_missing $40;
animal='1001';
form='exam';
days='1,2,3,36,103';
days_expected='0,1,2,4,36,103';
days_missing='0,4';
output;
animal='1001';
form='lab';
days='1,2,3,36,100';
days_expected='0,1,2,4,30,100';
days_missing='0,4,30';
output;
proc print;
run;
Yes, the transtrn as coded will remove '4,', but it did not remove the trailing '4' (as in '0,4') in the first observation. And it would also change '0,14,'16' to '0,116'.
Edited note:
Earlier, there was a suggestion, now deleted, of using the TRANSTRN function, which prompted my response above. Here is a way that TRANSTRN can be used without the problems I mentioned. It requires prepending and appending a ',' to days_missing, then run TRANSTRN against individual values, also preceded and succeeded by commas. Finally remove the extra commas:
data want;
set have;
length new_days_missing $42;
new_days_missing=cats(',',days_missing,',');
if form='exam' then do;
if findw(days,'3',',') then new_days_missing=transtrn(new_days_missing,',4,',',');
if findw(days,'1',',') then new_days_missing=transtrn(new_days_missing,',0,',',');
end;
else if form='lab' then do;
if findw(days,'1',',') then new_days_missing=transtrn(new_days_missing,',0,',',');
end;
if new_days_missing=',' then new_days_missing=' ';
else new_days_missing=substr(new_days_missing,2,length(new_days_missing)-2);
run;
do you have a better solution?
@tarheel13 wrote:
do you have a better solution?
Since we now have separate step from the original question of "determine days missing" I would say to place the values of days missing into an array, possibly temporary. Then we have NUMERIC values and can match them to (input(scan(days,i)) type code.
Which I would suggest doing at creating days missing instead of continuing with a poor data approach to begin with.
Perhaps after all the rules (any bets we get at least one more set related to this?) have been applied it might be worth sticking multiple values into a single one.
Really reading all of days, expected days into arrays likely makes this a BUNCH simpler in the long run than attempting to deal with character strings.
I saw your other post about how to generate days_missing. I think you are making life VERY hard for yourself by designing a data structure with these comma separated lists stored in character variables. Typically things are easier if you work with a vertical structure. I don't quite understand your data, but it looks like maybe you are keeping track of forms that are due (expected), and whether or not they have been received.
Instead of structuring like:
Animal Form Days Days_Expected Days_Missing 1001 exam 1,2,3,36,103 0,1,2,4,36,103 0,4 1001 lab 1,2,3,36,100 0,1,2,4,30,100 0,4,30
You could structure to have one record per form that is expected, with a boolean variable to indicate whether or not the form was received e.g.:
Animal Form Day Received 1001 exam 0 0 1001 exam 1 1 1001 exam 2 1 1001 exam 4 0 1001 exam 36 1 1001 exam 103 1 1001 lab 0 0 1001 lab 1 1 1001 lab 2 1 1001 lab 4 0 1001 lab 30 0 1001 lab 100 1
With a structure like that, it's much easier to work with.
data want;
set have;
if lowcase(strip(form)) eq 'exam' then do;
num= countc(days, ',');
do i=1 to num+1;
temp= input(scan(days, i, ','), best.);
if temp eq 3 then do;
count= countc(days_missing, ',');
do k=1 to count+1;
if input(scan(days_missing, k, ','), best.) eq 0 then days_missing= tranwrd(days_missing,scan(days_missing, k, ','), '');
end;
end;
else if temp eq 1 then do;
count= countc(days_missing, ',');
do k=1 to count+1;
if input(scan(days_missing, k, ','), best.) eq 4 then days_missing= tranwrd(days_missing,scan(days_missing, k, ','), '');
end;
end;
end;
end;
else if lowcase(strip(form)) eq 'lab' then do;
num= countc(days, ',');
do i=1 to num+1;
temp= input(scan(days, i, ','), best.);
if temp eq 1 then do;
count= countc(days_missing, ',');
do k=1 to count+1;
if input(scan(days_missing, k, ','), best.) eq 0 then days_missing= tranwrd(days_missing,scan(days_missing, k, ','), '');
end;
end;
end;
end;
if substr(strip(days_missing), 1, 1) eq ',' then days_missing=substrn(strip(days_missing), 2);
drop num temp count i k;
run;
for the second row, it gets days_missing='4,3' rather than '4,30'
Hi,
Use a regular expression:
data want;
set have;
length days_missing2 $30;
days_missing2=days_missing;
if form in ('exam' 'lab') then do;
days_missing2=prxchange('s/\D0(?!\d)//',-1,days_missing2);
days_missing2=prxchange('s/(?<!\d)0\D//',-1,days_missing2);
end;
if form='exam' then do;
days_missing2=prxchange('s/\D4(?!\d)//',-1,days_missing2);
days_missing2=prxchange('s/(?<!\d)4\D//',-1,days_missing2);
end;
run;
- Cheers -
I find regular expressions are still underused by most SAS programmers. Great solution!
Edit: the check on days must be added
data want;
set have;
length days_missing2 $30;
days_missing2=days_missing;
if form in ('exam' 'lab') and prxmatch('/(?<!\d)1(?!\d)/',days) then do;
days_missing2=prxchange('s/\D0(?!\d)//',-1,days_missing2);
days_missing2=prxchange('s/(?<!\d)0\D//',-1,days_missing2);
end;
if form='exam' and prxmatch('/(?<!\d)3(?!\d)/',days)then do;
days_missing2=prxchange('s/\D4(?!\d)//',-1,days_missing2);
days_missing2=prxchange('s/(?<!\d)4\D//',-1,days_missing2);
end;
run;
- Cheers -
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.