Hi everyone!
I need a bit of help thinking this out and finding the best way to handle this. Any suggestions on how I should attempt this are appreciate! Thank you in advanced!
So I have a data set with 3 variables, ID - lab name - lab result. Not all the results are numeric and this lab result variable is a character variable (so is the lab name variable). Every ID has many lab names. There are only 70 ID's but over 56,000 labs and about 500 different labs within this group .
numeric character character
id | labname | labresults |
10 | protine | negative |
10 | Platelets | 300 |
10 | carbon dioxide | 25 |
12 | Platelets | 260 |
12 | Bun | 9 |
33 | Glucose | 101 |
33 | Glucose | 100 |
33 | carbon dioxide | 26 |
33 | Glucose | 102 |
16 | protine | negative |
16 | platelets | 290 |
16 | Total protine | 9 |
16 | platelets | 140 |
23 | carbon dioxide | 26 |
23 | Bun | 10 |
23 | Total protine | 8.5 |
23 | protine | negative |
46 | Glucose | 130 |
46 | Bun | 11 |
In the end I want to have a table where the average result in each lab test is shown per ID. So I know i need to keep only the results that have numeric variables. It only need to be for those numeric values
id | protine | platelets | glucose |
10 | average | ||
12 | average | ||
16 | |||
23 |
I feel there are a 100 ways I can proceed.
I know have to find a way to sort through the results, and keep only those that are numeric.
I want to transpose the data, but so far this hasn't worked.
I'm thinking of writing a macro where I would feed it the ID's and have it run and gather all the test results for that ID. However I think I might be making this too complicated.
Any suggestions would be super appreciated! Thank You
It looks to be test issue. Certain tests report numbers, others report text. Ideally you'd know which was which, but it's not difficult to approximate this.
data numeric;
set have;
result_number = input(labresults, ?? best12.);
run;
proc means data=numeric noprint nway;
class id labname;
var result_number;
output out=summary mean(result_number) = average;
run;
Then transpose the result to the shown format - PROC TRANSPOSE. The variables without a numeric value will sort of filter themselves out. Or you can explicitly extract those results by adding a WHERE to the PROC MEANS to exclude missing data.
@Curly wrote:
Hi everyone!
I need a bit of help thinking this out and finding the best way to handle this. Any suggestions on how I should attempt this are appreciate! Thank you in advanced!
So I have a data set with 3 variables, ID - lab name - lab result. Not all the results are numeric and this lab result variable is a character variable (so is the lab name variable). Every ID has many lab names. There are only 70 ID's but over 56,000 labs and about 500 different labs within this group .
In the end I want to have a table where the average result in each lab test is shown per ID. So I know i need to drop the results that have numeric variables.
I feel there are a 100 ways I can proceed.
I know have to find a way to sort through the results, and keep only those that are numeric.
I want to transpose the data, but so far this hasn't worked.
I'm thinking of writing a macro where I would feed it the ID's and have it run and gather all the test results for that ID. However I think I might be making this too complicated.
Any suggestions would be super appreciated! Thank You
I'm having a super hard time following this...please post sample data and output to illustrate what you want. It doesn't have to be real data, but data that illustrates your problem.
If you need to do the same thing multiple times, the usual method is a BY or CLASS statement with PROC MEANS but it seems like you also need to add some data cleaning steps to get the data to a form that works for you first.
I highly suggest posting what you have and what you want and we can suggest the various methods, sometimes we disagree on the best methods on here 🙂
A quick example of one way of calculating statistics per group:
proc sql;
create table want as
select *, mean(weight) as weight_by_sex
from sashelp.class
group by sex
order by name;
quit;
Here's one approach.
Create a numeric variable holding the lab result, when that is possible:
numeric_result = input(lab_result, ??12.);
At this point, the lazy way would be to compute means of the numeric variable. But an intermediate step would normally be appropriate:
proc freq data=converted;
where numeric_result=.;
tables lab_result;
run;
This shows all the lab results that could not be converted directly to numeric. You may see patterns there that could be used (such as ignoring units of measure after a numeric value).
Once you have computed as many numeric values as possible, then generate the mean values.
@Curly wrote:
In the end I want to have a table where the average result in each lab test is shown per ID. So I know i need to drop the results that have numeric variables.
I have to say that this statement is very confusing. If you drop numeric values what dos is mean to have an "average result"?
Perhaps you mean something like a MODE, the most common count of the non-numeric values?
Example data, input and out for the input, goes a long way to clarifying meaning.
It looks to be test issue. Certain tests report numbers, others report text. Ideally you'd know which was which, but it's not difficult to approximate this.
data numeric;
set have;
result_number = input(labresults, ?? best12.);
run;
proc means data=numeric noprint nway;
class id labname;
var result_number;
output out=summary mean(result_number) = average;
run;
Then transpose the result to the shown format - PROC TRANSPOSE. The variables without a numeric value will sort of filter themselves out. Or you can explicitly extract those results by adding a WHERE to the PROC MEANS to exclude missing data.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.