I want to take a group of data and summarize it and put the values in a new column. Consider
data test;
format sn $2. ctry $3. date date9. code $3.;
input sn ctry lat long date code;
datalines;
1 US 1.25 2.13 20155 abc
1 US 1.26 2.12 20156 abc
1 US 1.32 2.15 20157 def
1 EU 2.53 3.12 20175 xyz
1 EU 2.53 2.13 20176 xyz
2 EU 5.67 10.2 20156 fgh
2 EU 5.68 10.5 20158 fgh
;
run;
I want to summarize by sn by finding the mean for lat and long, the date range, the mode of code, the last code, and give me the count of occurrences. The following data summarized would create a new table that has:
sn avg_lat avg_long start_dt end_dt code_mode code_last cnt 1 1.277 2.133 20155 20157 abc def 3 2 5.675 10.35 20156 20158 fgh fgh 2
I know how to do this in Python's pandas, but I cannot figure it out in SAS. I know there is proc means but I don't think (which means I don't know how to use proc means to do this if possible) it would make a table like this. Also, proc means can't do a count.
Does this make sense?
I was able to find a way do everything but the mode of code and last code for the data.
proc sql;
create table summarized_test as
select
sn,
ctry,
avg(lat) as avg_lat,
avg(long) as avg_long,
min(date) as start_dt,
max(date) as end_dt,
count(date) as cnt
from test
group by sn, ctry;
quit;
I also added a ctry line which I forgot about in the OP which was my mistake.
Hi,
You want proc means:
proc means data=test; by sn date; var lat long; output out=want mean=mean mode=mode min=min max=max; run;
@RW9 wrote:
Hi,
You want proc means:
proc means data=test; by sn date; var lat long; output out=want mean=mean mode=mode min=min max=max; run;
That looks fishy. What about LONG. You only provide names for LAT.
The MODE of a character field will be difficult for PROC MEANS. Otherwise, all things are possible. First, the questions, though. Should CNT be incremented even if LAT (and/or LONG) have a missing value? If DATE has a missing value, should that be ignored when computing the START_DT?
@Astounding, I have executed a where statement on real data set excluding these instances you speak of.
Ah oke. Sorry @dwsmith for ther mis-interpretation. The coincedental "2015" in the value tricked me.
Here's a PROC MEANS that gives you all statistics for the numeric fields. Working with the CODE will require different tools.
proc means data=test;
var lat long date;
by id;
output out=want (drop=_type_ rename=(_freq_ = cnt)) mean = avg_lat avg_long min(date) = start_dt max(date) = end_dt;
run;
proc print data=want (obs=5);
run;
See if this makes sense, and we can look at CODE separately.
data have; input Patient Dosedate: mmddyy10.; format Dosedate mmddyy10.; cards; 101 01/01/2015 101 01/02/2015 101 01/03/2015 101 01/04/2015 102 01/05/2015 102 01/06/2015 102 01/08/2015 102 01/10/2015 ; run; data want; merge have have(firstobs=2 rename=(Patient=_Patient Dosedate=_Dosedate)); output; if Patient=_Patient then do; do i=Dosedate+1 to _Dosedate-1; Dosedate=.;miss_date=i;output; end; end; format miss_date mmddyy10.; drop i _:; run;
data test; format sn $2. ctry $3. date date9. code $3.; input sn ctry lat long date code; datalines; 1 US 1.25 2.13 20155 abc 1 US 1.26 2.12 20156 abc 1 US 1.32 2.15 20157 def 1 EU 2.53 3.12 20175 xyz 1 EU 2.53 2.13 20176 xyz 2 EU 5.67 10.2 20156 fgh 2 EU 5.68 10.5 20158 fgh ; run; proc sql; create table temp as select *,count(*) as count from test group by sn,ctry,code; create table want as select sn, ctry, avg(lat) as avg_lat, avg(long) as avg_long, min(date) as start_dt, max(date) as end_dt, (select distinct code from temp where sn=a.sn and ctry=a.ctry having count=max(count)) as code_mode, (select distinct code from temp where sn=a.sn and ctry=a.ctry having date=max(date) ) as code_last, count(date) as cnt from temp as a group by sn, ctry; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.