I have a dataset with tons of material numbers who can have up to 44 codes each (all of 6 characters). I want to create new variables (ETT_1, ETT_2 and so on) which take the value of the codes starting with "ETT". Some of the materialnumbers can have >1 codes containing "ETT", why I have maked a count variable (count_ETT). So I should make a syntax for ETT_1 which says "if count_ETT=1 then ETT_1= the value of the first code (code_1-code_44) containing "ETT"), and a make a syntax for ETT_2 which says "if count_ETT=2 then ETT_2= the value of the second code (code_1-code_44) containing "ETT"). Is this possible??
Matnr | Code_1 | Code_2 | Code_3 | Code_4 | … | Code_44 | count_ETT | ETT.1 | ETT.2 |
1 | P40000 | T98000 | ETT800 | T98000 | ETT600 | 2 | ETT800 | ETT600 | |
2 | P70000 | T30300 | M87700 | ETT600 | T98000 | 1 | ETT600 | ||
3 | P33000 | ETT600 | M87710 | T55500 | T98000 | 1 | ETT600 |
Seems like a waste of time but pretty simple. Make an array of the input variables and another array to define the output variables. Then loop over the input variables checking for the ETT values.
data have ;
input Matnr (Code_1-Code_5) (:$8.) ;
cards;
1 P40000 T98000 ETT800 T98000 ETT600
2 P70000 T30300 M87700 ETT600 T98000
3 P33000 ETT600 M87710 T55500 T98000
;
data want;
set have ;
array in code_1-code_5;
array out $8 ett_1-ett_5;
count_ett=0;
do index=1 to dim(in);
if in[index]=:'ETT' then do;
count_ett+1;
out[count_ett]=in[index];
end;
end;
drop index;
run;
proc print;
run;
count_ Obs Matnr Code_1 Code_2 Code_3 Code_4 Code_5 ett_1 ett_2 ett_3 ett_4 ett_5 ett 1 1 P40000 T98000 ETT800 T98000 ETT600 ETT800 ETT600 2 2 2 P70000 T30300 M87700 ETT600 T98000 ETT600 1 3 3 P33000 ETT600 M87710 T55500 T98000 ETT600 1
First transpose to a long dataset, with matnr and code (keep an additional running position number if that is important). Then it is only a WHERE and a count:
data want;
set have;
where substr(code,1,3) = "ETT";
by matnr;
if first.matnr
then ett_count = 1;
else ett_count + 1;
run;
data want;
set have end=eof;
retain max_count 0;
array cd {*} code_1-code_44;
array et {*} ETT_1-ETT_44;
count_ETT = 0;
do i=1 to dim(cd);
if substr(cd(i),1,3='ETT') then do;
count_ETT +1;
max_count = mac(max_count, count_ETT;)
et(count_ETT) = cd(i);
end;
end;
drop i;
if eof and max_count < 44 then do;
put "Variables ETT_n for n > " max_count " can be drpped";
call symput('Max_Count', put(max_count,2.));
end;
run;
/*** Drop exess ETT_ varaibles ***/
%macro dropx;
data want;
set want;
drop
%do i=&max_count to 44;
ETT_&i%str( );
%end; ;
run;
%mend;
%dropx;
the code won't work. I've changed places in the code where I think you've made typos, but still won't work. See code, log and table ( I made a demo with only code_1 to code_4. I'm working on a secured server, so can't copy the SAScode directly in ).
1) It would be easier to test a code if you supply a test data using data step with infile statement.
2) I hope next fixed code will work. See remarks on updated lines:
%let max_count = 0; /* initial value */ data want; set have end=eof; retain max_count 0; length code_1-code_44 $6; /* line added */ array cd {*} $ code_1-code_44; /* $ added */ array et {*} ETT_1-ETT_44; count_ETT = 0; do i=1 to dim(cd); if substr(cd(i),1,3='ETT') then do; count_ETT +1; max_count = mac(max_count, count_ETT;) et(count_ETT) = cd(i); end; end; drop i; if eof and max_count < 44 then do; put "Variables ETT_n for n > " max_count " can be drpped"; call symput('Max_Count', put(max_count,2.)); end; run; /*** Drop exess ETT_ varaibles ***/ %macro dropx; %if &max_count=0 %then %goto exit; data want; set want; drop %do i=&max_count to 44; ETT_&i%str( ); %end; ; run; %exit: %mend; %dropx;
Seems like a waste of time but pretty simple. Make an array of the input variables and another array to define the output variables. Then loop over the input variables checking for the ETT values.
data have ;
input Matnr (Code_1-Code_5) (:$8.) ;
cards;
1 P40000 T98000 ETT800 T98000 ETT600
2 P70000 T30300 M87700 ETT600 T98000
3 P33000 ETT600 M87710 T55500 T98000
;
data want;
set have ;
array in code_1-code_5;
array out $8 ett_1-ett_5;
count_ett=0;
do index=1 to dim(in);
if in[index]=:'ETT' then do;
count_ett+1;
out[count_ett]=in[index];
end;
end;
drop index;
run;
proc print;
run;
count_ Obs Matnr Code_1 Code_2 Code_3 Code_4 Code_5 ett_1 ett_2 ett_3 ett_4 ett_5 ett 1 1 P40000 T98000 ETT800 T98000 ETT600 ETT800 ETT600 2 2 2 P70000 T30300 M87700 ETT600 T98000 ETT600 1 3 3 P33000 ETT600 M87710 T55500 T98000 ETT600 1
Thank you Tom !
The ETT code is a code for thickness in mm, where the numbers are mm from 0,01 mm to 20 mm:
ETT001=0,01 mm
ETT002=0,02 mm
...
ETT100=1,00 mm
...
ETT999= 9,99 mm
ETTX10=10 mm
ETTX11= 11 mm
...
ETTX20= 20 mm
Is it possible to convert the codes to mm ?
Are the mappings arbitrary or is there a pattern.
If the arbitrary the make a format.
proc format ;
value $ett2mm
'ETT001'='0,01 mm'
...
'ETTX20'= '20 mm'
;
run;
You could then either attach the format to the variables.
format ett: $ett2mm.
Or use it in the step that is moving the values.
out[count_ett]=put(in[index],$ett2mm.);
If there is a pattern then you can use input to read the values from the end of the strings into actual numbers. You can get INPUT to imply the decimal point by adding .2 to the end of the informat specification.
...
array out ett_1-ett_5;
...
if in[index] =: 'ETTX' then out[count_ett]=input(substr(in[index],5),32.);
else out[count_ett]=input(substr(in[index],4),32.2);
There is a patteren.
What should I fill in where you typed "...", sorry. (I am very new to SAS, normally using SPSS).
array in ett_1-ett_5;
array out ett_1-ett_5;
do index=1 to dim(in);
if in[index] =: 'ETTX' then out[count_ett]=input(substr(in[index],5),32.);
else out[count_ett]=input(substr(in[index],4),32.2);
end;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.