BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Curly
Fluorite | Level 6

Hi everyone!

 

I need a bit of help thinking this out and finding the best way to handle this. Any suggestions on how I should attempt this are appreciate! Thank you in advanced!

 

So I have a data set with 3 variables, ID - lab name - lab result. Not all the results are numeric and this lab result variable is a character variable (so is the lab name variable). Every ID has many lab names. There are only 70 ID's but over 56,000 labs and about 500 different labs within this group .

 

 

 numeric  character  character

idlabnamelabresults
10protinenegative 
10Platelets300
10carbon dioxide25
12Platelets260
12Bun9
33Glucose101
33Glucose100
33carbon dioxide26
33Glucose102
16protinenegative 
16platelets 290
16Total protine9
16platelets140
23carbon dioxide26
23Bun10
23Total protine8.5
23protinenegative 
46Glucose130
46Bun11

 

In the end I want to have a table where the average result in each lab test is shown per ID. So I know i need to keep only the results that have numeric variables. It only need to be for those numeric values

 

idprotineplateletsglucose
10average  
12 average 
16   
23   

 

I feel there are a 100 ways I can proceed.

I know have to find a way to sort through the results, and keep only those that are numeric.

I want to transpose the data, but so far this hasn't worked.

I'm thinking of writing a macro where I would feed it the ID's and have it run and gather all the test results for that ID. However I think I might be making this too complicated.

 

Any suggestions would be super appreciated! Thank You

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

It looks to be test issue. Certain tests report numbers, others report text. Ideally you'd know which was which, but it's not difficult to approximate this. 

 

 

data numeric;
set have;

result_number  = input(labresults, ?? best12.);

run;

proc means data=numeric noprint nway; 
class id labname;
var result_number;
output out=summary mean(result_number) = average;
run;

 

Then transpose the result to the shown format - PROC TRANSPOSE. The variables without a numeric value will sort of filter themselves out. Or you can explicitly extract those results by adding a WHERE to the PROC MEANS to exclude missing data.

View solution in original post

7 REPLIES 7
Reeza
Super User

@Curly wrote:

Hi everyone!

 

I need a bit of help thinking this out and finding the best way to handle this. Any suggestions on how I should attempt this are appreciate! Thank you in advanced!

 

So I have a data set with 3 variables, ID - lab name - lab result. Not all the results are numeric and this lab result variable is a character variable (so is the lab name variable). Every ID has many lab names. There are only 70 ID's but over 56,000 labs and about 500 different labs within this group .

 

 

In the end I want to have a table where the average result in each lab test is shown per ID. So I know i need to drop the results that have numeric variables.

 

I feel there are a 100 ways I can proceed.

I know have to find a way to sort through the results, and keep only those that are numeric.

I want to transpose the data, but so far this hasn't worked.

I'm thinking of writing a macro where I would feed it the ID's and have it run and gather all the test results for that ID. However I think I might be making this too complicated.

 

Any suggestions would be super appreciated! Thank You


I'm having a super hard time following this...please post sample data and output to illustrate what you want. It doesn't have to be real data, but data that illustrates your problem. 

 

If you need to do the same thing multiple times, the usual method is a BY or CLASS statement with PROC MEANS but it seems like you also need to add some data cleaning steps to get the data to a form that works for you first. 

 

I highly suggest posting what you have and what you want and we can suggest the various methods, sometimes we disagree on the best methods on here 🙂

 

A quick example of one way of calculating statistics per group:

proc sql;
create table want as 
select *, mean(weight) as weight_by_sex
from sashelp.class
group by sex
order by name;
quit;

 

Curly
Fluorite | Level 6
I added a few tables the type of data I am looking at.
Also thank you for your suggestion. I'm pretty new to sas, I have seen the class command, but I will definitely look more into it. I think it can be useful!
Astounding
PROC Star

Here's one approach.

 

Create a numeric variable holding the lab result, when that is possible:

 

numeric_result = input(lab_result, ??12.);

 

At this point, the lazy way would be to compute means of the numeric variable.  But an intermediate step would normally be appropriate:

 

proc freq data=converted;

where numeric_result=.;

tables lab_result;

run;

 

This shows all the lab results that could not be converted directly to numeric.  You may see patterns there that could be used (such as ignoring units of measure after a numeric value).

 

Once you have computed as many numeric values as possible, then generate the mean values.

ballardw
Super User

@Curly wrote:

 

In the end I want to have a table where the average result in each lab test is shown per ID. So I know i need to drop the results that have numeric variables.

 


I have to say that this statement is very confusing. If you drop numeric values what dos is mean to have an "average result"?

Perhaps you mean something like a MODE, the most common count of the non-numeric values?

 

Example data, input and out for the input, goes a long way to clarifying meaning.

Curly
Fluorite | Level 6
So sorry about the typo. I mean to drop the results that appear as characters, and keep all the results that are numeric. Thank you for that!
Reeza
Super User

It looks to be test issue. Certain tests report numbers, others report text. Ideally you'd know which was which, but it's not difficult to approximate this. 

 

 

data numeric;
set have;

result_number  = input(labresults, ?? best12.);

run;

proc means data=numeric noprint nway; 
class id labname;
var result_number;
output out=summary mean(result_number) = average;
run;

 

Then transpose the result to the shown format - PROC TRANSPOSE. The variables without a numeric value will sort of filter themselves out. Or you can explicitly extract those results by adding a WHERE to the PROC MEANS to exclude missing data.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 3898 views
  • 3 likes
  • 4 in conversation