data ss;
infile datalines;
input Patient_Number Encounter_Number Birth_Date Diagnosis_1$ Diagnosis_2$ Diagnosis_3$ Diagnosis_4$ Diagnosis_5$;
datalines;
1 1 31JUL1975 250.7 250.7 785.2
1 2 31JUL1975 250.3 250.3 288.8 995.93 466 250.1
1 3 31JUL1975 250.3 250.3 271.6 288.8
1 4 31JUL1975 250.3 250.3 250.1
1 5 31JUL1975 250.1 250.1
Array diag {5} $1 diagnosis_1-diagnosis_5;
keto=0;
do i=1 to 5;
if substr(diag[i],1,5) in ('250.1') then keto=1;
end;
drop i;
run;
If the first five position of 5 diagnoses are "250.1" then it indicates "XX". Using "ARRAY" and function "SUBSTR", how do i generate a new "XX" indicator variable?
It will really help to provide an example of what the desired result should be.
For instance what if multiple variables meet the condition? Do you want multiple results for "keto"?
If you only want to know "at least one of the variables has a value with 250.1" then this may work:
data ss; infile datalines truncover; input Patient_Number Encounter_Number Birth_Date :date9. Diagnosis_1$ Diagnosis_2$ Diagnosis_3$ Diagnosis_4$ Diagnosis_5$; format birth_date date9.; keto = index(catx('_',of diag:),'250.1')>0; datalines; 1 1 31JUL1975 250.7 250.7 785.2 1 2 31JUL1975 250.3 250.3 288.8 995.93 466 250.1 1 3 31JUL1975 250.3 250.3 271.6 . . . 288.8 1 4 31JUL1975 250.3 250.3 . . 250.1 1 5 31JUL1975 250.1 250.1 ; run;
If you have values such as 1250.1 that would also indicate keto, so may not be appropriate. We don't know all your possible values.
So
data ss; infile datalines truncover; input Patient_Number Encounter_Number Birth_Date :date9. Diagnosis_1$ Diagnosis_2$ Diagnosis_3$ Diagnosis_4$ Diagnosis_5$; format birth_date date9.; array d diagnosis:; keto=0; do i= 1 to dim(d); if d[i] =: '250.1' then do; keto=1; leave; end; end; datalines; 1 1 31JUL1975 250.7 250.7 785.2 1 2 31JUL1975 250.3 250.3 288.8 995.93 466 250.1 1 3 31JUL1975 250.3 250.3 271.6 . . . 288.8 1 4 31JUL1975 250.3 250.3 . . 250.1 1 5 31JUL1975 250.1 250.1 ; run;
The =: is a "begins with" comparison.
Leave says to stop the loop as soon as the condition is found to be true.
You might want to leave the I variable in the set as it would have the indicator for which of the diagnosis variables met the condition.
@JUMMY wrote:
@ballardw, I want only one variable created from this called XX. But I want to use the "substr" function too? Yours doesnt incluse that function.
You said "first five position of 5 diagnoses are "250.1" " which is why I propose use of the =: If you use substr requesting 5 positions and the value does not contain 5 positions you have problems. See this code and the error it generates.
data example; x='933'; y = substr(x,1,5); run;
By the time you add in additional code involving handling shorter variables the code is 1) less efficient and 2) just plain longer.
If there is no reason to use a function why force a solution using it. That way lies bureaucratic madness.
substrn(x,1,5);
or SUBPAD depending on the result needed.
Like @ballardw I see no use for SUBSTR or to iterate over the array.
@ballardw wrote:
If there is no reason to use a function why force a solution using it. That way lies bureaucratic madness.
It's homework.
I don't understand the question. You already posted the code for when XX is KETO. What do you want to do differently?
data ss;
infile datalines missover;
input Patient_Number Encounter_Number Birth_Date :date9. @;
array DIAG[6] $5;
input diag[*];
keto = '250.1' in diag;
format Bir: date9.;
datalines;
1 1 31JUL1975 250.7 250.7 785.2
1 2 31JUL1975 250.3 250.3 288.8 995.93 466 250.1
1 3 31JUL1975 250.3 250.3 271.6 . . 288.8
1 4 31JUL1975 250.3 250.3 . 250.1
1 5 31JUL1975 250.1 250.1
;;;;
run;
proc print;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.