BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LMan19
Calcite | Level 5

I have a large dataset that has includes values of a chemical concentration, the date of testing, the site of testing, and a lot of other variable not immediately relevant to this question. Some of these values are from samples taken on the same day; sometimes up to ~100 tests/day. So, for example, you end up with a data set that looks something like this:

SiteID SampleDate Concentration
1      1/1/2000               1
1      1/1/2000               2
1      1/1/2000               3
1      1/1/2001               1
1      1/1/2002               1
1      1/1/2003               1
2      1/1/2000               1
2      1/1/2000               2   
2      1/1/2001               1
2      1/1/2001               2
3      1/1/2000               1
3      1/1/2002               1
4      1/1/2001               1
4      1/1/2002               1
4      1/1/2003               1
4      1/1/2003               2
4      1/1/2004               1
5      1/1/2003               1

In subsequent analyses I just need the median value from each day, so I am trying to 1) determine the median concentration per site per day and 2) create a new output data set that uses the calculated median value rather than the original concentration and displays the number of samples taken on the same date; using the example data above, something like this:

SiteID SampleDate      Med_Conc  SampleN
1      1/1/2000               2        3
1      1/1/2001               1        1
1      1/1/2002               1        1
1      1/1/2003               1        1
2      1/1/2000             1.5        2
2      1/1/2001             1.5        2
3      1/1/2000               1        1
3      1/1/2002               1        1
4      1/1/2001               1        1
4      1/1/2002               1        1
4      1/1/2003             1.5        2
4      1/1/2004               1        1
5      1/1/2003               1        1

I'm at a loss on how to do this, so any help would be greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Like this?

 

data have;
input SiteID$ SampleDate:ddmmyy10. Concentration;
format SampleDate ddmmyy10.;
datalines;
1      1/1/2000               1
1      1/1/2000               2
1      1/1/2000               3
1      1/1/2001               1
1      1/1/2002               1
1      1/1/2003               1
2      1/1/2000               1
2      1/1/2000               2   
2      1/1/2001               1
2      1/1/2001               2
3      1/1/2000               1
3      1/1/2002               1
4      1/1/2001               1
4      1/1/2002               1
4      1/1/2003               1
4      1/1/2003               2
4      1/1/2004               1
5      1/1/2003               1
;

proc sql;
   create table want as
   select SiteID
         ,SampleDate
         ,median(Concentration) as Med_Conc
         ,count(Concentration) as SampleN
   from have
   group by SiteID, SampleDate;
quit;

View solution in original post

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20

Like this?

 

data have;
input SiteID$ SampleDate:ddmmyy10. Concentration;
format SampleDate ddmmyy10.;
datalines;
1      1/1/2000               1
1      1/1/2000               2
1      1/1/2000               3
1      1/1/2001               1
1      1/1/2002               1
1      1/1/2003               1
2      1/1/2000               1
2      1/1/2000               2   
2      1/1/2001               1
2      1/1/2001               2
3      1/1/2000               1
3      1/1/2002               1
4      1/1/2001               1
4      1/1/2002               1
4      1/1/2003               1
4      1/1/2003               2
4      1/1/2004               1
5      1/1/2003               1
;

proc sql;
   create table want as
   select SiteID
         ,SampleDate
         ,median(Concentration) as Med_Conc
         ,count(Concentration) as SampleN
   from have
   group by SiteID, SampleDate;
quit;
LMan19
Calcite | Level 5
Yep- that's exactly what I was looking for. Thanks a lot!
My sql skills are really lacking so I really appreciate it
PeterClemmensen
Tourmaline | Level 20

Anytime, glad to help.

 

And thank you for posting a clear question with sample data and desired output data. That makes it easy to help 🙂

ballardw
Super User

And an alternate solution:

proc summary data=have nway;
   class SiteId  SampleDate;
   var Concentration;
   output out=want (drop=_:) median=Med_conc n=SampleN;
run;

And I second @PeterClemmensen's thanks for good question style AND that the example data will create the desired output.

 

 

One advantage that Proc summary has that an SQL solution won't is the ability to create many summary statistic variables without having to explicitly name each one using the  / autoname option. You may find this handy when you have 20 variables to summarize and want mean, median , max , min, n, std and IQR for each, plus the ability to get more quantiles than SQL allows if needed.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 790 views
  • 3 likes
  • 3 in conversation