I want to create a new variable (Count_M87) which count the occurrences of a group of specific value (ex. 'M8700', 'M8741', 'M8788', 'M87900') through 44 variables (Code_1-Code_44).
Dataset want (not illustrated all 44 code_ variables):
ID Code_1 Code_2 Code_3 Code_4 Code_5 ...... Code_44 Count_M87
1 M8700 T001 M8741 p0750 M9634 M8788 3
2 T001 M8741 M8788 p0750 T001 T001 2
@MABRINCH wrote:
No to the first rule. if M8700 occurs 5 times and M8741 occurs 2 times it should be counted as 7.
yes to the second rule - it needs to be the specific value to be counted.
In somehow yes - the M87XX is pathology codes for histotypes of melanoma.
If you need to search an exact list then I think this may work for you, if @Reeza's solution didn't. It is a minor change really.
data want; set have; array t (4) $ 8 _temporary_ ('M8700', 'M8741', 'M8788', 'M87900'); do i= 1 to dim(t); count_m87= sum(count_m87,count(catx(' ', of code_1-code_44),STRIP( t[i]), 'i')); end;
drop i; run;
The array T is to hold temporary values, i.e. not written to the data set. The number of elements, (4) above, has to match the number of values you want to use. The $ 8 on the array says the values will be character with a maximum length of 8, adjust if you know they are all shorter or may be longer. The keyword _temporary_ says "don't keep these array variables in the output". The list of values is comma delimited and quoted and matches the number of elements. You'll get a warning like:
WARNING: Too many values for initialization of the array t. Excess values are ignored.
if you have more values than the definition will use.
The Strip function around the array element T[i] is because the full length of the variable, including trailing blanks, gets used. Since you showed at least one target value longer than the others this would arise with your values and this is common issue with many of the character searching functions.
count_M87 = count(catx(' ', of code_1-code_44), 'M87', 'i');
Try that in a data step and let me know if it works.
Based on third example out of the documentation for COUNT
@MABRINCH wrote:
I want to create a new variable (Count_M87) which count the occurrences of a group of specific value (ex. 'M8700', 'M8741', 'M8788', 'M87900') through 44 variables (Code_1-Code_44).
Dataset want (not illustrated all 44 code_ variables):
ID Code_1 Code_2 Code_3 Code_4 Code_5 ...... Code_44 Count_M87
1 M8700 T001 M8741 p0750 M9634 M8788 3
2 T001 M8741 M8788 p0750 T001 T001 2
Do you have any rules like if 'M8700' occurs 5 times that it should only be counted once (i.e. count of unique values).
Do you have values like 'M8700.1' that are not supposed to be counted even though 'M8700' should?
Just for giggles, by any chance is this ICD-10 related?
@MABRINCH wrote:
No to the first rule. if M8700 occurs 5 times and M8741 occurs 2 times it should be counted as 7.
yes to the second rule - it needs to be the specific value to be counted.
In somehow yes - the M87XX is pathology codes for histotypes of melanoma.
If you need to search an exact list then I think this may work for you, if @Reeza's solution didn't. It is a minor change really.
data want; set have; array t (4) $ 8 _temporary_ ('M8700', 'M8741', 'M8788', 'M87900'); do i= 1 to dim(t); count_m87= sum(count_m87,count(catx(' ', of code_1-code_44),STRIP( t[i]), 'i')); end;
drop i; run;
The array T is to hold temporary values, i.e. not written to the data set. The number of elements, (4) above, has to match the number of values you want to use. The $ 8 on the array says the values will be character with a maximum length of 8, adjust if you know they are all shorter or may be longer. The keyword _temporary_ says "don't keep these array variables in the output". The list of values is comma delimited and quoted and matches the number of elements. You'll get a warning like:
WARNING: Too many values for initialization of the array t. Excess values are ignored.
if you have more values than the definition will use.
The Strip function around the array element T[i] is because the full length of the variable, including trailing blanks, gets used. Since you showed at least one target value longer than the others this would arise with your values and this is common issue with many of the character searching functions.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.