DATA Step, Macro, Functions and more

Median values from from repeated sampling

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Median values from from repeated sampling

I have a large dataset that has includes values of a chemical concentration, the date of testing, the site of testing, and a lot of other variable not immediately relevant to this question. Some of these values are from samples taken on the same day; sometimes up to ~100 tests/day. So, for example, you end up with a data set that looks something like this:

SiteID SampleDate Concentration
1      1/1/2000               1
1      1/1/2000               2
1      1/1/2000               3
1      1/1/2001               1
1      1/1/2002               1
1      1/1/2003               1
2      1/1/2000               1
2      1/1/2000               2   
2      1/1/2001               1
2      1/1/2001               2
3      1/1/2000               1
3      1/1/2002               1
4      1/1/2001               1
4      1/1/2002               1
4      1/1/2003               1
4      1/1/2003               2
4      1/1/2004               1
5      1/1/2003               1

In subsequent analyses I just need the median value from each day, so I am trying to 1) determine the median concentration per site per day and 2) create a new output data set that uses the calculated median value rather than the original concentration and displays the number of samples taken on the same date; using the example data above, something like this:

SiteID SampleDate      Med_Conc  SampleN
1      1/1/2000               2        3
1      1/1/2001               1        1
1      1/1/2002               1        1
1      1/1/2003               1        1
2      1/1/2000             1.5        2
2      1/1/2001             1.5        2
3      1/1/2000               1        1
3      1/1/2002               1        1
4      1/1/2001               1        1
4      1/1/2002               1        1
4      1/1/2003             1.5        2
4      1/1/2004               1        1
5      1/1/2003               1        1

I'm at a loss on how to do this, so any help would be greatly appreciated.


Accepted Solutions
Solution
‎09-05-2017 10:32 AM
PROC Star
Posts: 1,190

Re: Median values from from repeated sampling

Like this?

 

data have;
input SiteID$ SampleDate:ddmmyy10. Concentration;
format SampleDate ddmmyy10.;
datalines;
1      1/1/2000               1
1      1/1/2000               2
1      1/1/2000               3
1      1/1/2001               1
1      1/1/2002               1
1      1/1/2003               1
2      1/1/2000               1
2      1/1/2000               2   
2      1/1/2001               1
2      1/1/2001               2
3      1/1/2000               1
3      1/1/2002               1
4      1/1/2001               1
4      1/1/2002               1
4      1/1/2003               1
4      1/1/2003               2
4      1/1/2004               1
5      1/1/2003               1
;

proc sql;
   create table want as
   select SiteID
         ,SampleDate
         ,median(Concentration) as Med_Conc
         ,count(Concentration) as SampleN
   from have
   group by SiteID, SampleDate;
quit;

View solution in original post


All Replies
Solution
‎09-05-2017 10:32 AM
PROC Star
Posts: 1,190

Re: Median values from from repeated sampling

Like this?

 

data have;
input SiteID$ SampleDate:ddmmyy10. Concentration;
format SampleDate ddmmyy10.;
datalines;
1      1/1/2000               1
1      1/1/2000               2
1      1/1/2000               3
1      1/1/2001               1
1      1/1/2002               1
1      1/1/2003               1
2      1/1/2000               1
2      1/1/2000               2   
2      1/1/2001               1
2      1/1/2001               2
3      1/1/2000               1
3      1/1/2002               1
4      1/1/2001               1
4      1/1/2002               1
4      1/1/2003               1
4      1/1/2003               2
4      1/1/2004               1
5      1/1/2003               1
;

proc sql;
   create table want as
   select SiteID
         ,SampleDate
         ,median(Concentration) as Med_Conc
         ,count(Concentration) as SampleN
   from have
   group by SiteID, SampleDate;
quit;
New Contributor
Posts: 2

Re: Median values from from repeated sampling

Yep- that's exactly what I was looking for. Thanks a lot!
My sql skills are really lacking so I really appreciate it
PROC Star
Posts: 1,190

Re: Median values from from repeated sampling

Anytime, glad to help.

 

And thank you for posting a clear question with sample data and desired output data. That makes it easy to help Smiley Happy

Super User
Posts: 13,046

Re: Median values from from repeated sampling

And an alternate solution:

proc summary data=have nway;
   class SiteId  SampleDate;
   var Concentration;
   output out=want (drop=_:) median=Med_conc n=SampleN;
run;

And I second @draycut's thanks for good question style AND that the example data will create the desired output.

 

 

One advantage that Proc summary has that an SQL solution won't is the ability to create many summary statistic variables without having to explicitly name each one using the  / autoname option. You may find this handy when you have 20 variables to summarize and want mean, median , max , min, n, std and IQR for each, plus the ability to get more quantiles than SQL allows if needed.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 107 views
  • 3 likes
  • 3 in conversation