I'm working with the NIS (national inpatient sample) and in dire need some help...
I am trying to find the-time-to-procedure, which is a numeric variable. This is the way NIS database is laid out:
Each observation can have multiple procedures (up to 15, i.e. PR_1 to PR_15). I have been using "array" statements to search for the procedures within each observation. I have been successful in obtaining counts (i.e. how many observations have a certain procedure of interest).
However, to obtain the-time-to-procedure, I have to know the exact variable name (between PR_1 to PR_15) which the procedure falls under; because the-time-to-procedure variable is also named between x_1 to x _15 corresponding to number the procedure falls under PR_1 to PR_15 (depending on which variable_number the procedure was coded).
i.e.
If PR_2 = procedure of interest, PR_time_2 = time-to-procedure value
If PR_12 = procedure of interest, PR_time_12 = time to procedure value
The dataset has millions of observations. Therefore, I need a way to do this automatically defining the variable name (with the procedure of interest) and using that information to find the value (time to procedure) within each observation. Is there a way to do this using some sort of array or SQL language?
Any help would be so appreciated,
Hi:
I'm not sure I completely understand your data structure. This is what I envision -- but on a larger scale:
I only used a small number of "grp" variables and "tval" variables that were numeric because 3 of each is enough to illustrate the point.
So if you only want to find one code and its corresponding time value, you would have something like this, where the code I am looking for is AAB, which might appear in just one group or multiple groups for an ID or might not appear at all or might appear only once. As you can see, in my fake data, the code AAB appears 5 times:
But, if you want to find more than one code and the corresponding time values, you could use something like this, with a user-defined format. In this instance, I wanted to find codes AAB and BBC, as shown below:
Of course, there are other "lookup" techniques possible, but it really depends on the structure of your data and the ultimate use of the program (report? data set?). And these examples all assume that each "grp" variable has a corresponding "tval" variable.
Hope this helps,
Cynthia
Nice demonstration @Cynthia_sas
It sounds like you are concerned with finding the related array element in the Pr_time values for a given value of Pr.
Without example data it is a bit difficult to demonstrate specific functions due to character/numeric but this may get you started.
data example; input pr_1 - pr_3 pr_time1-pr_time3; array pr pr_1-pr_3; array t pr_time1-pr_time3 ; time= t[whichn(6,of pr(*))]; datalines; 3 2 6 15 18 25 ; run;
The WHICHN (of if the values are character WHICHC) searches for the first value, which would the known value you are searching for among the array values of pr and returns the index of the first variable containing the desired value. or 0 if not found.
Since you would in likelihood not always have a found value you would likely want to have something like
if whichn(6,of pr(*))>0 then time= t[whichn(6,of pr(*))];
Note that is possibly to functions inside array index locations as long as the function will return integer values (within the defined array size to avoid out of range error messages).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.