year | month | date | id | v1 | v2 |
2001 | 1 | 1 | 1 | 1 | 0 |
2001 | 1 | 2 | 2 | 0 | 1 |
2001 | 1 | 12 | 2 | 0 | 1 |
2001 | 2 | 13 | 2 | 1 | 1 |
2002 | 2 | 14 | 3 | 1 | 0 |
2002 | 3 | 1 | 3 | 0 | 1 |
2002 | 5 | 12 | 3 | 1 | 1 |
2003 | 2 | 13 | 2 | 0 | 1 |
2003 | 4 | 14 | 2 | 0 | 1 |
2003 | 5 | 1 | 2 | 1 | 1 |
2003 | 5 | 12 | 3 | 1 | 0 |
2003 | 7 | 13 | 3 | 0 | 1 |
2003 | 8 | 14 | 3 | 0 | 1 |
2003 | 9 | 1 | 3 | 1 | 1 |
2003 | 10 | 1 | 3 | 1 | 0 |
2003 | 11 | 13 | 3 | 0 | 1 |
2003 | 12 | 25 | 3 | 0 | 1 |
From this data, I want to make a different table by summing frequencies from each variable by year and moth.
The expected result will be like this.
year | month | v1_freq | v2_freq |
2001 | 1 | 1 | 2 |
2001 | 2 | 2 | 1 |
2001 | 3 | 0 | 0 |
2001 | 4 | 0 | 0 |
2001 | 5 | 0 | 0 |
2001 | 6 | 0 | 0 |
2001 | 7 | 0 | 0 |
2001 | 8 | 0 | 0 |
2001 | 9 | 0 | 0 |
2001 | 10 | 0 | 0 |
2001 | 11 | 0 | 0 |
2001 | 12 | 0 | 0 |
2002 | 1 | 0 | 0 |
2002 | 2 | 1 | 0 |
2002 | 3 | 0 | 1 |
2002 | 4 | 0 | 0 |
2002 | 5 | 1 | 1 |
2002 | 6 | 0 | 0 |
2002 | 7 | 0 | 0 |
2002 | 8 | 0 | 0 |
2002 | 9 | 0 | 0 |
2002 | 10 | 0 | 0 |
2002 | 11 | 0 | 0 |
2002 | 12 | 0 | 0 |
2003 | 1 | 0 | 0 |
2003 | 2 | 0 | 1 |
2003 | 3 | 0 | 0 |
2003 | 4 | 0 | 1 |
2003 | 5 | 2 | 1 |
2003 | 6 | 0 | 0 |
2003 | 7 | 0 | 1 |
2003 | 8 | 0 | 1 |
2003 | 9 | 1 | 1 |
2003 | 10 | 1 | 0 |
2003 | 11 | 0 | 1 |
2003 | 12 | 0 | 1 |
What codes can I use?
Thank you so much in advance!
proc summary data=have nway;
class year month;
var v1 v2;
output out=want sum=v1freq v2freq;
run;
The rows for months that have no data will not appear in the output via this method. If you need them to appear, you would have to augment your original data set so that there is at least one record for each year/month combination.
proc summary data=have nway;
class year month;
var v1 v2;
output out=want sum=v1freq v2freq;
run;
The rows for months that have no data will not appear in the output via this method. If you need them to appear, you would have to augment your original data set so that there is at least one record for each year/month combination.
Use PROC MEANS to summarize your data. You control the levels of aggregating using the CLASS statement.
If you need more than one level, ie totals or such then you want to look into WAYS and TYPES but otherwise this will get you there:
https://github.com/statgeek/SAS-Tutorials/blob/master/proc_means_basic.sas
@asinusdk wrote:
year month date id v1 v2 2001 1 1 1 1 0 2001 1 2 2 0 1 2001 1 12 2 0 1 2001 2 13 2 1 1 2002 2 14 3 1 0 2002 3 1 3 0 1 2002 5 12 3 1 1 2003 2 13 2 0 1 2003 4 14 2 0 1 2003 5 1 2 1 1 2003 5 12 3 1 0 2003 7 13 3 0 1 2003 8 14 3 0 1 2003 9 1 3 1 1 2003 10 1 3 1 0 2003 11 13 3 0 1 2003 12 25 3 0 1
From this data, I want to make a different table by summing frequencies from each variable by year and moth.
The expected result will be like this.
year month v1_freq v2_freq 2001 1 1 2 2001 2 2 1 2001 3 0 0 2001 4 0 0 2001 5 0 0 2001 6 0 0 2001 7 0 0 2001 8 0 0 2001 9 0 0 2001 10 0 0 2001 11 0 0 2001 12 0 0 2002 1 0 0 2002 2 1 0 2002 3 0 1 2002 4 0 0 2002 5 1 1 2002 6 0 0 2002 7 0 0 2002 8 0 0 2002 9 0 0 2002 10 0 0 2002 11 0 0 2002 12 0 0 2003 1 0 0 2003 2 0 1 2003 3 0 0 2003 4 0 1 2003 5 2 1 2003 6 0 0 2003 7 0 1 2003 8 0 1 2003 9 1 1 2003 10 1 0 2003 11 0 1 2003 12 0 1
What codes can I use?
Thank you so much in advance!
Assuming that your input data are sorted by YEAR, MONTH, you can use a single DATA step to:
In the SAS language:
data have ;
input year month v1 v2 ;
cards ;
2001 1 1 0
2001 1 0 1
2001 1 0 1
2001 2 1 1
2002 2 1 0
2002 3 0 1
2002 5 1 1
2003 2 0 1
2003 4 0 1
2003 5 1 1
2003 5 1 0
2003 7 0 1
2003 8 0 1
2003 9 1 1
2003 10 1 0
2003 11 0 1
2003 12 0 1
run ;
data want (drop = _:) ;
set have (keep = year month v:) ;
by year month ;
v1_freq + v1 ;
v2_freq + v2 ;
if last.month ;
output ;
v1_freq = 0 ;
v2_freq = 0 ;
if _n_ < n ;
p = _n_ + 1 ;
set have (keep = year month rename = (year=_y month=_m)) point=p nobs=n ;
_dt = mdy (month, 1, year) ;
do _q = 2 to intck ("mon", _dt, mdy (_m, 1, _y)) ;
_dt = intnx ("mon", _dt, 1) ;
year = year (_dt) ;
month = month (_dt) ;
output ;
end ;
run ;
If the input data set isn't sorted, basically the same gap-filling scheme can be enacted using the hash object by:
In SAS words:
data _null_ ;
dcl hash h (ordered:"a") ;
h.definekey ("year", "month") ;
h.definedata ("year", "month", "v1_freq", "v2_freq") ;
h.definedone () ;
do until (z) ;
set have (keep = year month v:) end = z ;
if h.find() ne 0 then call missing (v1_freq, v2_freq) ;
v1_freq + v1 ;
v2_freq + v2 ;
h.replace() ;
_ldt = min (_ldt, mdy (month, 1, year)) ;
_hdt = max (_hdt, mdy (month, 1, year)) ;
end ;
do _n_ = 1 to intck ("mon", _ldt, _hdt) ;
_ldt = intnx ("mon", _ldt, 1) ;
year = year (_ldt) ;
month = month (_ldt) ;
h.ref (key:year, key:month, data:year, data:month, data:0, data:0) ;
end ;
h.output (dataset:"want") ;
run ;
Of course, the problem can be solved using either MEANS/SUMMARY, as suggested earlier, or SQL, with the aggregated result set left-joined with a separately prepared data set containing the fill-gap (YEAR,MONTH) and zero sums. Preparing such a data set is principally no different from running a DO loop similar to the DO _N_= loop above. The advantage of the hash object is that the entire processing can be done in a single step and a single pass through the input data, regardless of the input key sequence.
Paul D.
Data have ;
input year month v1 v2 ;
cards ;
2001 1 1 0
2001 1 0 1
2001 1 0 1
2001 2 1 1
2002 2 1 0
2002 3 0 1
2002 5 1 1
2003 2 0 1
2003 4 0 1
2003 5 1 1
2003 5 1 0
2003 7 0 1
2003 8 0 1
2003 9 1 1
2003 10 1 0
2003 11 0 1
2003 12 0 1
;
run ;
PROC SQL;
CREATE TABLE WANT AS
SELECT YEAR,MONTH,SUM(V1) AS FR_V1,SUM(V2) AS FR_V2 FROM HAVE GROUP BY YEAR ,MONTH;
QUIT;
If you wanted to use PROC SQL above code can help. if wanted to see all month data at least one entry is needed for each month per year.
Thanks...
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.