Hello,
I have huge dataset, trying to get the repeated ids in prior quarter in Proc SQL. Months(Jan-Aug) - 2 quarters
Given Data:
Id Month
1 jan
1 feb
1 mar
1 apr
1 may
1 jun
1 jul
1 aug
2 jan
2 feb
2 mar
2 apr
2 may
2 aug
3 mar
3 apr
3 jun
3 aug
4 jun
5 jan
5 may
5 jun
5 aug
6 jan
Need Output:
Quarter1 Total no Ids No of ID's repeated in prior quarter
Jan-apr 6 0
may-aug 5 4
Thanks for checking
Below will need some modification in case your Month variable is not character but a SAS date/datetime value or if your source table is in a database.
You write your source table is "huge". What does that mean? Can you quantify? If it's really "huge" and performance is an issue then you'd need to tell us a bit more like: Where is the source table stored, is the data already sorted, type of variables (character or numeric, length of character variables if any, number of rows - may be best just post the result of a Proc Contents).
proc format;
invalue period
'jan', 'feb', 'mar', 'apr' = 1
'may', 'jun' , 'jul', 'aug' = 2
other=.
;
value period
1 = 'jan - apr'
2 = 'may - aug'
other= 'other'
;
run;
data have;
input id month $;
group=input(month, period.);
datalines;
1 jan
1 feb
1 mar
1 apr
1 may
1 jun
1 jul
1 aug
2 jan
2 feb
2 mar
2 apr
2 may
2 aug
3 mar
3 apr
3 jun
3 aug
4 jun
5 jan
5 may
5 jun
5 aug
6 jan
7 feb
;
proc sql;
select
o.group format=period.,
count(distinct o.id) as cnt_id,
count(distinct i.id) as cnt_id_prev_period
from
have o
left join
(
select distinct id, group
from have
/* where group=1*/
) i
on o.id=i.id and o.group=i.group+1
group by o.group
;
quit;
Is it a requirement to use PROC SQL?
Below will need some modification in case your Month variable is not character but a SAS date/datetime value or if your source table is in a database.
You write your source table is "huge". What does that mean? Can you quantify? If it's really "huge" and performance is an issue then you'd need to tell us a bit more like: Where is the source table stored, is the data already sorted, type of variables (character or numeric, length of character variables if any, number of rows - may be best just post the result of a Proc Contents).
proc format;
invalue period
'jan', 'feb', 'mar', 'apr' = 1
'may', 'jun' , 'jul', 'aug' = 2
other=.
;
value period
1 = 'jan - apr'
2 = 'may - aug'
other= 'other'
;
run;
data have;
input id month $;
group=input(month, period.);
datalines;
1 jan
1 feb
1 mar
1 apr
1 may
1 jun
1 jul
1 aug
2 jan
2 feb
2 mar
2 apr
2 may
2 aug
3 mar
3 apr
3 jun
3 aug
4 jun
5 jan
5 may
5 jun
5 aug
6 jan
7 feb
;
proc sql;
select
o.group format=period.,
count(distinct o.id) as cnt_id,
count(distinct i.id) as cnt_id_prev_period
from
have o
left join
(
select distinct id, group
from have
/* where group=1*/
) i
on o.id=i.id and o.group=i.group+1
group by o.group
;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.