BookmarkSubscribeRSS Feed
radhikaa4
Calcite | Level 5

I have the following dataset. I would like to extract first and last date per city and doing a count by city as well

 

Have:

 

Citydates
New York1/5/2019
Boston4/20/2019
Washington DC8/20/2019
Washington DC9/1/2019
Washington DC10/19/2019
Washington DC12/13/2019
Hartford5/5/2019
Pittsburgh6/4/2019
Philadelphia4/6/2019
Philadelphia6/14/2019
Philadelphia9/8/2019

 

Want

 

CityCountfirstDateLastDate
New York11/5/20191/5/2019
Boston14/20/20194/20/2019
Washington DC48/20/201912/13/2019
Hartford15/5/20195/5/2019
Pittsburgh16/4/20196/4/2019
Philadelphia34/6/20199/8/2019

 

Possibly also percent per city? like NY will have 1/11, Boston 1/11 (9.1%), etc

 

Here is waht i have done so far

proc sql;

select distinct city, count(*) as total_count, min(dates) as firstDate, max(dates) as lastDate

from table group by city;quit;

 

However when I run it, all the dates are converted into NUMBERS, not dates. How do I add n/total % ? 

 

THanks!!

7 REPLIES 7
Kurt_Bremser
Super User

Add formats to display dates correctly, and use a preliminary step to get the overall count.

proc sql noprint;
select nobs into :tc
from dictionary.tables
where libname = "WORK" and memname = "TABLE";
select
  city,
  count(*) as total_count,
  calculated total_count / &tc. as percent format=percent7.2,
  min(dates) as firstDate format=e8601da10.,
  max(dates) as lastDate format=e8601da10.
from table
group by city;
quit;

 

Edit: added "calculated" in the SQL to prevent ERROR 

PaigeMiller
Diamond | Level 26
total_count / &tc. as percent format=percent7.2,

I always point out that this calculation fails to produce the proper answer if there are missing values in the data.

--
Paige Miller
Kurt_Bremser
Super User

@PaigeMiller wrote:
total_count / &tc. as percent format=percent7.2,

I always point out that this calculation fails to produce the proper answer if there are missing values in the data.


Why?

Missing values would just end up as another group, with a correct count. The user can then decide to filter them out, or not.

See this:

proc sql noprint;
select nobs into :tc
from dictionary.tables
where libname = "WORK" and memname = "CLASS";
create table want as
select
  sex,
  count(*) as total_count,
  calculated total_count / &tc. as percent format=percent7.2
from class
group by sex;
quit;

proc print data=want noobs;
run;

Result:

Sex	total_count	percent
 	1	5.00%
F	9	45.0%
M	10	50.0%
PaigeMiller
Diamond | Level 26

@Kurt_Bremser wrote:

@PaigeMiller wrote:
total_count / &tc. as percent format=percent7.2,

I always point out that this calculation fails to produce the proper answer if there are missing values in the data.


Why?

Missing values would just end up as another group, with a correct count. The user can then decide to filter them out, or not.


Because people don't realize they have to make a decision to account for missings (or they don't even realize what their code is doing with missings). I have seen examples where people code count(*) in PROC SQL and they get the wrong answer. Or they develop the code on data that doesn't have missings, and then it fails later when run on code with missings. I consider using count(*) a dangerous thing to do.

--
Paige Miller
ballardw
Super User

You have asked several very similar questions. Please tell us if you want a data set (for further manipulation or analysis) or a report that people read.

 

Most of your questions if a report is needed can be answered by one of the reporting procedures like Proc Report or Proc tabulate:

 

data have;
infile datalines dlm=',';
input City :$25. dates mmddyy10.;
format dates mmddyy10.;
datalines;
New York,1/5/2019 
Boston,4/20/2019 
Washington DC,8/20/2019 
Washington DC,9/1/2019 
Washington DC,10/19/2019 
Washington DC,12/13/2019 
Hartford,5/5/2019 
Pittsburgh,6/4/2019 
Philadelphia,4/6/2019 
Philadelphia,6/14/2019 
Philadelphia,9/8/2019 
;

proc tabulate data=have;
   class city;
   var dates;
   tables city ,
          dates=' '*(n='Count' colpctn='%' min='First Date'*f=mmddyy10. max='Last Date'*f=mmddyy10.)
  ;
run;

If you do not specify a specific date format for a calculated result then it will default to BEST. Since SAS date values are the number of days from 1 Jan 1960 then unformatted values look like 21340 instead of a formatted date like 06/05/2018 or 05Jun2018.

data_null__
Jade | Level 19

PROC SUMMARY has features that make this easy even when the dates are character in iso8601 format.

 

data have;
   infile cards dsd;
   input City :$25. dates mmddyy10.;
   format dates yymmdd10.;
   dtc = put(dates,yymmdd10.);
   cards;
New York,1/5/2019 
Boston,4/20/2019 
Washington DC,8/20/2019 
Washington DC,9/1/2019 
Washington DC,10/19/2019 
Washington DC,12/13/2019 
Hartford,5/5/2019 
Pittsburgh,6/4/2019 
Philadelphia,4/6/2019 
Philadelphia,6/14/2019 
Philadelphia,9/8/2019 
;;;;
proc summary data=have nway missing;
   class city;
   output out=need(drop=_type_ rename=(_freq_=Count))
      idgroup(min(dtc) out(dtc)=Start)
      idgroup(max(dtc) out(dtc)=End)
      ;
   run;

image.png

ed_sas_member
Meteorite | Level 14

Hi @radhikaa4 

You can try this:

proc sql;
	create table want as
	select a.City,
   	 	   a.Count,
   	 	   a.Count/b.total as Percent format=Percent8.1,
   	 	   a.firstDate format=MMDDYY10.,
   	 	   a.LastDate  format=MMDDYY10.
	from (select City,
		  		 count(City) as Count,
		  		 min(Dates) as FirstDate,
		  		 max(Dates) as LastDate
		  from table
		  group by City) as a full join 
		 (select distinct City, count(*) as total from table) as b
	on a.City = b.City
;
quit;

You can apply a format (e.g. MMDDYY10.) directly after the calculation so that dates can be human-readable : e.g. min(Dates) as FirstDate format = MMDDYY10.

To compute percent, I have merged you first query (alias 'a') with another another one (alias 'b') that retrieve the total number of records.

 

Alias "a"Alias "a"Alias "b"Alias "b"Final table combining alias "a" and "b"Final table combining alias "a" and "b"

 

 

 

 

 

 

 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 873 views
  • 0 likes
  • 6 in conversation