how do I sort though character variables and ID's to make a simple table with averages?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

how do I sort though character variables and ID's to make a simple table with averages?

[ Edited ]

Hi everyone!

 

I need a bit of help thinking this out and finding the best way to handle this. Any suggestions on how I should attempt this are appreciate! Thank you in advanced!

 

So I have a data set with 3 variables, ID - lab name - lab result. Not all the results are numeric and this lab result variable is a character variable (so is the lab name variable). Every ID has many lab names. There are only 70 ID's but over 56,000 labs and about 500 different labs within this group .

 

 

 numeric  character  character

idlabnamelabresults
10protinenegative 
10Platelets300
10carbon dioxide25
12Platelets260
12Bun9
33Glucose101
33Glucose100
33carbon dioxide26
33Glucose102
16protinenegative 
16platelets 290
16Total protine9
16platelets140
23carbon dioxide26
23Bun10
23Total protine8.5
23protinenegative 
46Glucose130
46Bun11

 

In the end I want to have a table where the average result in each lab test is shown per ID. So I know i need to keep only the results that have numeric variables. It only need to be for those numeric values

 

idprotineplateletsglucose
10average  
12 average 
16   
23   

 

I feel there are a 100 ways I can proceed.

I know have to find a way to sort through the results, and keep only those that are numeric.

I want to transpose the data, but so far this hasn't worked.

I'm thinking of writing a macro where I would feed it the ID's and have it run and gather all the test results for that ID. However I think I might be making this too complicated.

 

Any suggestions would be super appreciated! Thank You

 

 


Accepted Solutions
Solution
‎03-14-2017 08:44 PM
Super User
Posts: 19,192

Re: how do I sort though character variables and ID's to make a simple table with averages?

It looks to be test issue. Certain tests report numbers, others report text. Ideally you'd know which was which, but it's not difficult to approximate this. 

 

 

data numeric;
set have;

result_number  = input(labresults, ?? best12.);

run;

proc means data=numeric noprint nway; 
class id labname;
var result_number;
output out=summary mean(result_number) = average;
run;

 

Then transpose the result to the shown format - PROC TRANSPOSE. The variables without a numeric value will sort of filter themselves out. Or you can explicitly extract those results by adding a WHERE to the PROC MEANS to exclude missing data.

View solution in original post


All Replies
Super User
Posts: 19,192

Re: how do I sort though character variables and ID's to make a simple table with averages?


Curly wrote:

Hi everyone!

 

I need a bit of help thinking this out and finding the best way to handle this. Any suggestions on how I should attempt this are appreciate! Thank you in advanced!

 

So I have a data set with 3 variables, ID - lab name - lab result. Not all the results are numeric and this lab result variable is a character variable (so is the lab name variable). Every ID has many lab names. There are only 70 ID's but over 56,000 labs and about 500 different labs within this group .

 

 

In the end I want to have a table where the average result in each lab test is shown per ID. So I know i need to drop the results that have numeric variables.

 

I feel there are a 100 ways I can proceed.

I know have to find a way to sort through the results, and keep only those that are numeric.

I want to transpose the data, but so far this hasn't worked.

I'm thinking of writing a macro where I would feed it the ID's and have it run and gather all the test results for that ID. However I think I might be making this too complicated.

 

Any suggestions would be super appreciated! Thank You


I'm having a super hard time following this...please post sample data and output to illustrate what you want. It doesn't have to be real data, but data that illustrates your problem. 

 

If you need to do the same thing multiple times, the usual method is a BY or CLASS statement with PROC MEANS but it seems like you also need to add some data cleaning steps to get the data to a form that works for you first. 

 

I highly suggest posting what you have and what you want and we can suggest the various methods, sometimes we disagree on the best methods on here Smiley Happy

 

A quick example of one way of calculating statistics per group:

proc sql;
create table want as 
select *, mean(weight) as weight_by_sex
from sashelp.class
group by sex
order by name;
quit;

 

Occasional Contributor
Posts: 8

Re: how do I sort though character variables and ID's to make a simple table with averages?

I added a few tables the type of data I am looking at.
Also thank you for your suggestion. I'm pretty new to sas, I have seen the class command, but I will definitely look more into it. I think it can be useful!
Super User
Posts: 5,372

Re: how do I sort though character variables and ID's to make a simple table with averages?

Here's one approach.

 

Create a numeric variable holding the lab result, when that is possible:

 

numeric_result = input(lab_result, ??12.);

 

At this point, the lazy way would be to compute means of the numeric variable.  But an intermediate step would normally be appropriate:

 

proc freq data=converted;

where numeric_result=.;

tables lab_result;

run;

 

This shows all the lab results that could not be converted directly to numeric.  You may see patterns there that could be used (such as ignoring units of measure after a numeric value).

 

Once you have computed as many numeric values as possible, then generate the mean values.

Super User
Posts: 11,144

Re: how do I sort though character variables and ID's to make a simple table with averages?


Curly wrote:

 

In the end I want to have a table where the average result in each lab test is shown per ID. So I know i need to drop the results that have numeric variables.

 


I have to say that this statement is very confusing. If you drop numeric values what dos is mean to have an "average result"?

Perhaps you mean something like a MODE, the most common count of the non-numeric values?

 

Example data, input and out for the input, goes a long way to clarifying meaning.

Occasional Contributor
Posts: 8

Re: how do I sort though character variables and ID's to make a simple table with averages?

So sorry about the typo. I mean to drop the results that appear as characters, and keep all the results that are numeric. Thank you for that!
Solution
‎03-14-2017 08:44 PM
Super User
Posts: 19,192

Re: how do I sort though character variables and ID's to make a simple table with averages?

It looks to be test issue. Certain tests report numbers, others report text. Ideally you'd know which was which, but it's not difficult to approximate this. 

 

 

data numeric;
set have;

result_number  = input(labresults, ?? best12.);

run;

proc means data=numeric noprint nway; 
class id labname;
var result_number;
output out=summary mean(result_number) = average;
run;

 

Then transpose the result to the shown format - PROC TRANSPOSE. The variables without a numeric value will sort of filter themselves out. Or you can explicitly extract those results by adding a WHERE to the PROC MEANS to exclude missing data.

Occasional Contributor
Posts: 8

Re: how do I sort though character variables and ID's to make a simple table with averages?

Thank you!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 172 views
  • 3 likes
  • 4 in conversation