Hello there, Among other variables, I have a data set with 6 character variables (lenght 4) like this:
Obs | Var1 | Var2 | Var3 | Var4 | Var5 | Var6 |
1 | O809 | O809 | Z370 | Z301 | Z390 | A20X |
2 | B171 | B172 | K746 | I10X | I519 | K546 |
3 | O809 | O809 | Z370 | Z302 | Z390 | |
4 | X101 | X102 | I12X |
and would like:
1.- to get rid of the "X" but only for entries with the "X" at the end so de data would look like:
Obs | Var1 | Var2 | Var3 | Var4 | Var5 | Var6 |
1 | O809 | O809 | Z370 | Z301 | Z390 | A20 |
2 | B171 | B172 | K746 | I10 | I519 | K546 |
3 | O809 | O809 | Z370 | Z302 | Z390 | |
4 | X101 | X102 | I12 |
then
2.- I would like to create new variables that are within a specific range, for example if I want a new variable Z to select any entry of Var1 to Var 6 between Z301 and Z370, the new variable would be like the example below, in this case "VariableZ", if I want another new varible to select any value of Var1 to Var6 to be between B171 and B172 or between X101 and X102 the new variables would be respectivelly like "VariableB" and "VariableX" below and so on
New data set | |||||||||
Obs | Var1 | Var2 | Var3 | Var4 | Var5 | Var6 | VariableZ | VariableB | VariableX |
1 | O809 | O809 | Z370 | Z301 | Z390 | A20 | Z301-Z370 | ||
2 | B171 | B172 | K746 | I10 | I519 | K546 | B171-172 | ||
3 | O809 | O829 | Z370 | Z302 | Z390 | Z301-Z370 | |||
4 | X101 | D109 | I12 | X101-X102 |
all your help will be appreciated
Thanks
EHG.
This is a good task to show off the benefits of the CHAR function (extracts a single character substring), the INPUT function, and especially the SELECT statement:
data have;
input (Var1-Var6) ($);
cards;
O809 O809 Z370 Z301 Z390 A20X
B171 B172 K746 I10X I519 K546
O809 O809 Z370 Z302 Z390 .
X101 X102 I12X . . .
;
data want;
set have;
array v {*} var1-var6;
do I=1 to dim(v);
if char(var6,4)='X' then substr(var6,4,1)=' ';
select (char(v{I},1));
when ('Z') if 301 <= input(substr(v{I},2),best32.) <= 370 then varz='Z301-Z370';
when ('B') if 171 <= input(substr(v{I},2),best32.) <= 172 then varb='B171-B172';
when ('X') if 101 <= input(substr(v{I},2),best32.) <= 102 then varx='X101-X102';
otherwise ;
end;
end;
run;
The INPUT function is set to an informat of BEST32. as an insurance policy. It means you generally don't have to worry about the length of the character variables being INPUTed.
Part 1 is pretty easy. In a DATA step:
array var {6};
do i=1 to 6;
if substr(var{i}, 4, 1) = 'X' then substr(var{i}, 4, 1) = ' ';
end;
For part 2, I'm not really sure what you are trying to achieve here. But I would warn you about making character comparisons. For example, as character strings, "Z32" falls within the range of "Z301" through "Z370".
I'm sure that someone (Hi @Reeza) will complain about my using DO OVER, the following is an easy way to accomplish both tasks:
data have; input (Var1-Var6) ($); cards; O809 O809 Z370 Z301 Z390 A20X B171 B172 K746 I10X I519 K546 O809 O809 Z370 Z302 Z390 . X101 X102 I12X . . . ; data want; set have; array stuff var1-var6; do over stuff; if substr(stuff,length(stuff),1) eq 'X' then substr(stuff,length(stuff),1) = ''; if substr(stuff,1,2) eq 'Z3' and 1<=input(substr(stuff,3,2),8.)<=70 then variablez='Z301-Z370'; if stuff in ('B171','B172') then variableb='B171-B172'; end; run;
HTH,
Art, CEO, AnalystFinder.com
This is a good task to show off the benefits of the CHAR function (extracts a single character substring), the INPUT function, and especially the SELECT statement:
data have;
input (Var1-Var6) ($);
cards;
O809 O809 Z370 Z301 Z390 A20X
B171 B172 K746 I10X I519 K546
O809 O809 Z370 Z302 Z390 .
X101 X102 I12X . . .
;
data want;
set have;
array v {*} var1-var6;
do I=1 to dim(v);
if char(var6,4)='X' then substr(var6,4,1)=' ';
select (char(v{I},1));
when ('Z') if 301 <= input(substr(v{I},2),best32.) <= 370 then varz='Z301-Z370';
when ('B') if 171 <= input(substr(v{I},2),best32.) <= 172 then varb='B171-B172';
when ('X') if 101 <= input(substr(v{I},2),best32.) <= 102 then varx='X101-X102';
otherwise ;
end;
end;
run;
The INPUT function is set to an informat of BEST32. as an insurance policy. It means you generally don't have to worry about the length of the character variables being INPUTed.
Thank you everybody,
the code worked just fine
your help is very much appreciated.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.