BookmarkSubscribeRSS Feed
dwsmith
Obsidian | Level 7

I want to take a group of data and summarize it and put the values in a new column. Consider

 

data test;
 format sn $2. ctry $3. date date9. code $3.;
 input sn ctry lat long date code;
 datalines;
 1 US 1.25 2.13 20155 abc
 1 US 1.26 2.12 20156 abc
 1 US 1.32 2.15 20157 def
 1 EU 2.53 3.12 20175 xyz
 1 EU 2.53 2.13 20176 xyz
 2 EU 5.67 10.2 20156 fgh
 2 EU 5.68 10.5 20158 fgh
 ;
run;

I want to summarize by sn by finding the mean for lat and long, the date range, the mode of code, the last code, and give me the count of occurrences. The following data summarized would create a new table that has:

sn  avg_lat  avg_long  start_dt  end_dt  code_mode code_last cnt
1    1.277    2.133     20155    20157   abc       def       3
2    5.675    10.35     20156    20158   fgh       fgh       2

I know how to do this in Python's pandas, but I cannot figure it out in SAS. I know there is proc means but I don't think (which means I don't know how to use proc means to do this if possible) it would make a table like this. Also, proc means can't do a count.

 

Does this make sense?

 

I was able to find a way do everything but the mode of code and last code for the data.

proc sql;
	create table summarized_test as
	select
		sn,
		ctry,
		avg(lat) as avg_lat,
		avg(long) as avg_long,
		min(date) as start_dt,
		max(date) as end_dt,
		count(date) as cnt
	from test
	group by sn, ctry;
quit;

I also added a ctry line which I forgot about in the OP which was my mistake.

 

 

 

 

11 REPLIES 11
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

 

You want proc means:

proc means data=test;
  by sn date;
  var lat long;
  output out=want mean=mean mode=mode min=min max=max;
run;
data_null__
Jade | Level 19

@RW9 wrote:

Hi,

 

You want proc means:

proc means data=test;
  by sn date;
  var lat long;
  output out=want mean=mean mode=mode min=min max=max;
run;

That looks fishy.  What about LONG.  You only provide names for LAT.

Astounding
PROC Star

The MODE of a character field will be difficult for PROC MEANS.  Otherwise, all things are possible.  First, the questions, though.  Should CNT be incremented even if LAT (and/or LONG) have a missing value?  If DATE has a missing value, should that be ignored when computing the START_DT?

dwsmith
Obsidian | Level 7

@Astounding, I have executed a where statement on real data set excluding these instances you speak of.

jklaverstijn
Rhodochrosite | Level 12
I will leave the answering of the stats to others. But I want to suggest you read the book on SAS dates and related formats and informats. As it is now your code will show mangled values for column date. E.g. 20155 shows up as 08MAR2015. Finding out why will prove very educational.

- Jan.
dwsmith
Obsidian | Level 7
@jklaverstijn I want my date to show up as 08MAR2015
jklaverstijn
Rhodochrosite | Level 12

Ah oke. Sorry @dwsmith for ther mis-interpretation. The coincedental "2015" in the value tricked me.

Astounding
PROC Star

Here's a PROC MEANS that gives you all statistics for the numeric fields.  Working with the CODE will require different tools.

 

proc means data=test;

var lat long date;

by id;

output out=want (drop=_type_ rename=(_freq_ = cnt)) mean = avg_lat avg_long min(date) = start_dt max(date) = end_dt;

run;

 

proc print data=want (obs=5);

run;

 

See if this makes sense, and we can look at CODE separately.

dwsmith
Obsidian | Level 7
@Astounding, I have a partial solution as well I am going to add to my OP.
Ksharp
Super User
data have;
input Patient Dosedate: mmddyy10.;
format Dosedate mmddyy10.;
cards;
101      01/01/2015
101      01/02/2015
101      01/03/2015
101      01/04/2015
102      01/05/2015
102      01/06/2015
102      01/08/2015
102      01/10/2015
;
run;
data want;
 merge have have(firstobs=2 rename=(Patient=_Patient  Dosedate=_Dosedate));
 output;
 if Patient=_Patient then do;
  do i=Dosedate+1 to _Dosedate-1;
   Dosedate=.;miss_date=i;output;
  end; 
 end;
 format miss_date mmddyy10.;
 drop i _:;
run;


Ksharp
Super User
data test;
 format sn $2. ctry $3. date date9. code $3.;
 input sn ctry lat long date code;
 datalines;
 1 US 1.25 2.13 20155 abc
 1 US 1.26 2.12 20156 abc
 1 US 1.32 2.15 20157 def
 1 EU 2.53 3.12 20175 xyz
 1 EU 2.53 2.13 20176 xyz
 2 EU 5.67 10.2 20156 fgh
 2 EU 5.68 10.5 20158 fgh
 ;
run;

proc sql;
create table temp as 
 select *,count(*) as count
  from test 
   group by sn,ctry,code;
   
create table want as
 select
		sn,
		ctry,
		avg(lat) as avg_lat,
		avg(long) as avg_long,
		min(date) as start_dt,
		max(date) as end_dt,
		(select distinct code 
		  from temp
		   where sn=a.sn and ctry=a.ctry
		     having count=max(count)) as code_mode,
	    (select distinct code 
		  from temp
		   where sn=a.sn and ctry=a.ctry
		     having date=max(date) )  as code_last,
		count(date) as cnt
	from temp as a
	group by sn, ctry;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1221 views
  • 0 likes
  • 6 in conversation