BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LMan19
Calcite | Level 5

I have a large dataset that has includes values of a chemical concentration, the date of testing, the site of testing, and a lot of other variable not immediately relevant to this question. Some of these values are from samples taken on the same day; sometimes up to ~100 tests/day. So, for example, you end up with a data set that looks something like this:

SiteID SampleDate Concentration
1      1/1/2000               1
1      1/1/2000               2
1      1/1/2000               3
1      1/1/2001               1
1      1/1/2002               1
1      1/1/2003               1
2      1/1/2000               1
2      1/1/2000               2   
2      1/1/2001               1
2      1/1/2001               2
3      1/1/2000               1
3      1/1/2002               1
4      1/1/2001               1
4      1/1/2002               1
4      1/1/2003               1
4      1/1/2003               2
4      1/1/2004               1
5      1/1/2003               1

In subsequent analyses I just need the median value from each day, so I am trying to 1) determine the median concentration per site per day and 2) create a new output data set that uses the calculated median value rather than the original concentration and displays the number of samples taken on the same date; using the example data above, something like this:

SiteID SampleDate      Med_Conc  SampleN
1      1/1/2000               2        3
1      1/1/2001               1        1
1      1/1/2002               1        1
1      1/1/2003               1        1
2      1/1/2000             1.5        2
2      1/1/2001             1.5        2
3      1/1/2000               1        1
3      1/1/2002               1        1
4      1/1/2001               1        1
4      1/1/2002               1        1
4      1/1/2003             1.5        2
4      1/1/2004               1        1
5      1/1/2003               1        1

I'm at a loss on how to do this, so any help would be greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Like this?

 

data have;
input SiteID$ SampleDate:ddmmyy10. Concentration;
format SampleDate ddmmyy10.;
datalines;
1      1/1/2000               1
1      1/1/2000               2
1      1/1/2000               3
1      1/1/2001               1
1      1/1/2002               1
1      1/1/2003               1
2      1/1/2000               1
2      1/1/2000               2   
2      1/1/2001               1
2      1/1/2001               2
3      1/1/2000               1
3      1/1/2002               1
4      1/1/2001               1
4      1/1/2002               1
4      1/1/2003               1
4      1/1/2003               2
4      1/1/2004               1
5      1/1/2003               1
;

proc sql;
   create table want as
   select SiteID
         ,SampleDate
         ,median(Concentration) as Med_Conc
         ,count(Concentration) as SampleN
   from have
   group by SiteID, SampleDate;
quit;

View solution in original post

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

Like this?

 

data have;
input SiteID$ SampleDate:ddmmyy10. Concentration;
format SampleDate ddmmyy10.;
datalines;
1      1/1/2000               1
1      1/1/2000               2
1      1/1/2000               3
1      1/1/2001               1
1      1/1/2002               1
1      1/1/2003               1
2      1/1/2000               1
2      1/1/2000               2   
2      1/1/2001               1
2      1/1/2001               2
3      1/1/2000               1
3      1/1/2002               1
4      1/1/2001               1
4      1/1/2002               1
4      1/1/2003               1
4      1/1/2003               2
4      1/1/2004               1
5      1/1/2003               1
;

proc sql;
   create table want as
   select SiteID
         ,SampleDate
         ,median(Concentration) as Med_Conc
         ,count(Concentration) as SampleN
   from have
   group by SiteID, SampleDate;
quit;
LMan19
Calcite | Level 5
Yep- that's exactly what I was looking for. Thanks a lot!
My sql skills are really lacking so I really appreciate it
PeterClemmensen
Tourmaline | Level 20

Anytime, glad to help.

 

And thank you for posting a clear question with sample data and desired output data. That makes it easy to help 🙂

ballardw
Super User

And an alternate solution:

proc summary data=have nway;
   class SiteId  SampleDate;
   var Concentration;
   output out=want (drop=_:) median=Med_conc n=SampleN;
run;

And I second @PeterClemmensen's thanks for good question style AND that the example data will create the desired output.

 

 

One advantage that Proc summary has that an SQL solution won't is the ability to create many summary statistic variables without having to explicitly name each one using the  / autoname option. You may find this handy when you have 20 variables to summarize and want mean, median , max , min, n, std and IQR for each, plus the ability to get more quantiles than SQL allows if needed.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1060 views
  • 3 likes
  • 3 in conversation