Hi All,
I have a datetime variable and would like to use datepart(datetime) as monthyear and applied as monyy7. format in proc sql.
in the second step I was doing grouping and where clause on monthyear >012022
but not working. It works only >'01jan2022'd
also my grouping giving by date wise when i check the data so what i want by group by monthyear .
I thought it would group by monthyear but internally it store full value of date(01jan2022) . so when we do group by monthyear then my count values coming by date not by monthyear.
note: I tried like month(datepart(date)) as month and year(datepart(date)) as year
and combined as cat(month,year) as monthyear but it will be converted to character, cant use in where statement > < symbols (where monthyear>012022 not working as well.
I would like to do it in proc sql only this grouping. appreciate your thoughts pls.
As shown in my first reply (just for a different summary function) you can get the result from HAVE also in one step.
Here is a variant of that using a WHERE clause instead of a HAVING clause:
proc sql;
select intnx('month',datepart(datetime),0) format=monyy7. as monthyear, count(distinct x) as s
from have
where datetime>='01FEB2022:0:0'dt
group by monthyear;
quit;
show your sample data please and code.
Hi @vallsas,
Let me first create sample data to have something to work with:
data have;
input datetime :datetime. x;
format datetime datetime19.;
cards;
05JAN2022:12:34:56 8
31JAN2022:13:45:00 2
05FEB2022:12:34:56 7
28FEB2022:13:45:00 5
05MAR2022:12:34:56 3
28MAR2022:13:45:00 4
05APR2022:12:34:56 6
28APR2022:13:45:00 9
;
From this HAVE dataset you want to create an aggregated table with a grouping variable MONTHYEAR by means of PROC SQL? Then MONTHYEAR must have the same value for all datetimes in a month. Hence datepart(datetime) is not suitable, even if formatted with the MONYY7. format, because the internal values, not the formatted values would be used to group observations.
But you could map all datetimes to the first day of the month:
proc sql;
select intnx('month',datepart(datetime),0) format=monyy7. as monthyear, sum(x) as s
from have
group by monthyear
having monthyear>'01JAN2022'd;
quit;
Result:
monthyear s ------------------- FEB2022 12 MAR2022 7 APR2022 15
Note that the rows are sorted in chronological order, as they should, because the internal values of MONTHYEAR are SAS date values. This is also the reason why the HAVING clause uses a date literal.
proc sql;
create table new as select a.id, datepart(datetime) as monthyear format=monyy7.,
b.id,b.age,b.cti_id
from T1TAB LEFT JOIN T2TAB
ON a.id=b.id
quit;
proc sql;
create table new1 as select monthyear,cti_id,count(distinct id) as countid, count(distinct age) as coutage from new
where monthyear >'01jan2022'd
group by monthyear,cti_id;
quit;
here results coming
jan2022 but it wasnot grouping by monthyear when i see the data i was grouping by date .
so i get more number of observation as summary of count.
So in your second PROC SQL step you would select intnx('month',monthyear,0) format=monyy7. either as a new variable (e.g. monthyear1) or again as monthyear. In the latter case you would need to refer to calculated monthyear in the GROUP BY clause. The HAVING clause would replace the WHERE clause:
group by calculated monthyear, cti_id
having monthyear>'01JAN2022'd;
Edit: Or adapt the WHERE clause to the fact that it uses the internal values of the original monthyear variable:
where monthyear>'31JAN2022'd group by calculated monthyear, cti_id;
If variable MONTHYEAR is not contained in the input dataset HAVE, but just being created in the current SELECT statement, you need to write calculated monthyear in a WHERE clause (as you did redundantly in the GROUP BY clause). Alternatively, you can use the HAVING clause (but the WHERE clause should be more efficient). My suggested code with "where monthyear ..." referred to your second PROC SQL step where MONTHYEAR had already been created in the preceding step.
Also, are you sure that you want to include x in the GROUP BY clause when your summary statistic is count(distinct x)?
Since the group determines the result of a summary function, you cannot use this result to determine the group. You created a circular definition.
Please provide example data (as shown, in a data step with datalines), and what you want to get out of it.
Just omit S in the GROUP BY clause.
Using the HAVE dataset from my earlier post I assume you have created something like:
proc sql;
create table tt as
select datepart(datetime) format=monyy7. as monthyear, x
from have;
quit;
Now you can count the distinct values of x per month:
proc sql;
select intnx('month',monthyear,0) format=monyy7. as monthyear, count(distinct x) as s
from tt
where monthyear>'31JAN2022'd
group by calculated monthyear;
quit;
Result:
monthyear s ------------------- FEB2022 2 MAR2022 2 APR2022 2
As shown in my first reply (just for a different summary function) you can get the result from HAVE also in one step.
Here is a variant of that using a WHERE clause instead of a HAVING clause:
proc sql;
select intnx('month',datepart(datetime),0) format=monyy7. as monthyear, count(distinct x) as s
from have
where datetime>='01FEB2022:0:0'dt
group by monthyear;
quit;
Other than the statistical procedures like FREQ or MEANS, SQL GROUP BY will use the raw, unformatted values. So you need to create the new variable either as character with the formatted values, or use INTNX to align the dates to the first of the month.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.