DATA Step, Macro, Functions and more

Summarizing data in a new column

Reply
Contributor
Posts: 40

Summarizing data in a new column

[ Edited ]

I want to take a group of data and summarize it and put the values in a new column. Consider

 

data test;
 format sn $2. ctry $3. date date9. code $3.;
 input sn ctry lat long date code;
 datalines;
 1 US 1.25 2.13 20155 abc
 1 US 1.26 2.12 20156 abc
 1 US 1.32 2.15 20157 def
 1 EU 2.53 3.12 20175 xyz
 1 EU 2.53 2.13 20176 xyz
 2 EU 5.67 10.2 20156 fgh
 2 EU 5.68 10.5 20158 fgh
 ;
run;

I want to summarize by sn by finding the mean for lat and long, the date range, the mode of code, the last code, and give me the count of occurrences. The following data summarized would create a new table that has:

sn  avg_lat  avg_long  start_dt  end_dt  code_mode code_last cnt
1    1.277    2.133     20155    20157   abc       def       3
2    5.675    10.35     20156    20158   fgh       fgh       2

I know how to do this in Python's pandas, but I cannot figure it out in SAS. I know there is proc means but I don't think (which means I don't know how to use proc means to do this if possible) it would make a table like this. Also, proc means can't do a count.

 

Does this make sense?

 

I was able to find a way do everything but the mode of code and last code for the data.

proc sql;
	create table summarized_test as
	select
		sn,
		ctry,
		avg(lat) as avg_lat,
		avg(long) as avg_long,
		min(date) as start_dt,
		max(date) as end_dt,
		count(date) as cnt
	from test
	group by sn, ctry;
quit;

I also added a ctry line which I forgot about in the OP which was my mistake.

 

 

 

 

Super User
Super User
Posts: 7,970

Re: Summarizing data in a new column

Hi,

 

You want proc means:

proc means data=test;
  by sn date;
  var lat long;
  output out=want mean=mean mode=mode min=min max=max;
run;
Respected Advisor
Posts: 3,799

Re: Summarizing data in a new column


RW9 wrote:

Hi,

 

You want proc means:

proc means data=test;
  by sn date;
  var lat long;
  output out=want mean=mean mode=mode min=min max=max;
run;

That looks fishy.  What about LONG.  You only provide names for LAT.

Super User
Posts: 5,513

Re: Summarizing data in a new column

The MODE of a character field will be difficult for PROC MEANS.  Otherwise, all things are possible.  First, the questions, though.  Should CNT be incremented even if LAT (and/or LONG) have a missing value?  If DATE has a missing value, should that be ignored when computing the START_DT?

Contributor
Posts: 40

Re: Summarizing data in a new column

[ Edited ]
Posted in reply to Astounding

@Astounding, I have executed a where statement on real data set excluding these instances you speak of.

Super Contributor
Posts: 441

Re: Summarizing data in a new column

I will leave the answering of the stats to others. But I want to suggest you read the book on SAS dates and related formats and informats. As it is now your code will show mangled values for column date. E.g. 20155 shows up as 08MAR2015. Finding out why will prove very educational.

- Jan.
Contributor
Posts: 40

Re: Summarizing data in a new column

Posted in reply to jklaverstijn
@jklaverstijn I want my date to show up as 08MAR2015
Super Contributor
Posts: 441

Re: Summarizing data in a new column

Ah oke. Sorry @dwsmith for ther mis-interpretation. The coincedental "2015" in the value tricked me.

Super User
Posts: 5,513

Re: Summarizing data in a new column

Here's a PROC MEANS that gives you all statistics for the numeric fields.  Working with the CODE will require different tools.

 

proc means data=test;

var lat long date;

by id;

output out=want (drop=_type_ rename=(_freq_ = cnt)) mean = avg_lat avg_long min(date) = start_dt max(date) = end_dt;

run;

 

proc print data=want (obs=5);

run;

 

See if this makes sense, and we can look at CODE separately.

Contributor
Posts: 40

Re: Summarizing data in a new column

Posted in reply to Astounding
@Astounding, I have a partial solution as well I am going to add to my OP.
Super User
Posts: 10,035

Re: Summarizing data in a new column

data have;
input Patient Dosedate: mmddyy10.;
format Dosedate mmddyy10.;
cards;
101      01/01/2015
101      01/02/2015
101      01/03/2015
101      01/04/2015
102      01/05/2015
102      01/06/2015
102      01/08/2015
102      01/10/2015
;
run;
data want;
 merge have have(firstobs=2 rename=(Patient=_Patient  Dosedate=_Dosedate));
 output;
 if Patient=_Patient then do;
  do i=Dosedate+1 to _Dosedate-1;
   Dosedate=.;miss_date=i;output;
  end; 
 end;
 format miss_date mmddyy10.;
 drop i _:;
run;


Super User
Posts: 10,035

Re: Summarizing data in a new column

data test;
 format sn $2. ctry $3. date date9. code $3.;
 input sn ctry lat long date code;
 datalines;
 1 US 1.25 2.13 20155 abc
 1 US 1.26 2.12 20156 abc
 1 US 1.32 2.15 20157 def
 1 EU 2.53 3.12 20175 xyz
 1 EU 2.53 2.13 20176 xyz
 2 EU 5.67 10.2 20156 fgh
 2 EU 5.68 10.5 20158 fgh
 ;
run;

proc sql;
create table temp as 
 select *,count(*) as count
  from test 
   group by sn,ctry,code;
   
create table want as
 select
		sn,
		ctry,
		avg(lat) as avg_lat,
		avg(long) as avg_long,
		min(date) as start_dt,
		max(date) as end_dt,
		(select distinct code 
		  from temp
		   where sn=a.sn and ctry=a.ctry
		     having count=max(count)) as code_mode,
	    (select distinct code 
		  from temp
		   where sn=a.sn and ctry=a.ctry
		     having date=max(date) )  as code_last,
		count(date) as cnt
	from temp as a
	group by sn, ctry;
quit;

Ask a Question
Discussion stats
  • 11 replies
  • 387 views
  • 0 likes
  • 6 in conversation