Hello,
I want to make data that have a date variable and variable for counts for population from 1998 till 2016 by month/year count. I have census data for 5 years (1996, 2001, 2006, 2011, 2016) and I have the count of population for these years, now I want to calculat the population between these years (intercensal estimates) by month/year using a formula below, but I got not enough ifn arguments an dI know that is true. what other method, I can use. Please advise me, I've been stuck with this for 2 weeks now and I need to move on with the rest of analysis.
r=log (end datecount/start date count)5;
then popcount= exp(log (start years)+r*n / where n= the period from the start date so in 1998 it will be the period from 1996 and so on.
I know it is complicated but I tried a code as the following:
data int_census;
do year= 1998 to 2016;
do month 1 to 12;
*** Establish population at start of interval. with population estimate in 1996 (etc) ***;
p_start = ifn(year<2001, 1334110
ifn(year<2006,1472115,
ifn(year<2011,xxx06,xxx11))); *so on for the rest of the years**
*** Establish population at end of interval. Substitute xxx01 (etc) with population estimate in 2001 (etc) ***;
p_end = ifn(year<2001,xxx01,
ifn(year<2006,xxx06,
ifn(year<2011,xxx11,xxx16)));
*** Calculate r ***;
r= ln(p_end/p_start)/60;
*** Calculate the number of years between loop variable YEAR and the ‘base year’. The MOD function returns the remainder of ARG1 divided by ARG2. ***;
years = mod(year-1996,5);
*** Calculate intercensal estimate (p_interval) for month MONTH in year YEAR ***;
p_interval = exp(ln(p_start) + r*(years*12 + [month-1]);
output;
end;
end;
run;
Take a look at CANSIM 051-0001 - you can build the age groups you want in the add/remove data, select your geography and build your estimates back to 1971 for some numbers.
Depending on what province you're in, check your provincial statistical agency for more data on their page, including their population estimates and forecasts.
PS if you use R, there's a CANSIM package to automatically load data from a specific CANSIM table, sadly I haven't seen anything similar for SAS yet.
What country are you in?
Usually the Stats agency (Statistics Canada, US Census Bureau) has intercensal estimates...especially for population.
Hi,
I'm in Canada, I want it for certain age population, I will email them then and find out.
thank you
Take a look at CANSIM 051-0001 - you can build the age groups you want in the add/remove data, select your geography and build your estimates back to 1971 for some numbers.
Depending on what province you're in, check your provincial statistical agency for more data on their page, including their population estimates and forecasts.
PS if you use R, there's a CANSIM package to automatically load data from a specific CANSIM table, sadly I haven't seen anything similar for SAS yet.
thank you so much for your reply. According to your advice, I've contacted Stats Canada and they provided the link and I have yearly data now. Now I want to creat monthly data for each year according to my equation using the estimates between each successive year. But, I've been trying to figure out how to do it through arrays and found it really hard.
so my new data should look like below:
year | month |monthly population estimate
1998 1 1455666
1998 2 168889
and so on
I have now the yearly estimates but need to do the monthly estimates using the yearly ones, still looking through arrays how to best to do it. Any ideas??
Thanks again for the advise. In regards to R, I'm just learning R recently. Have you ever heard of doing tables in R markdown through pipes and then knitting will produce the table as you specify it through knitting.
So they use the | ............| and they put the commands of the numbers or calculations between pipes. I didn't find any materials in regardes to that, but they are using this method a lot in research.
Best
Lamia
You are asking to do exactly one of the purposes of sas PROC EXPAND, which is set to change the make esitmates to change the frequency of a time series (in your case from annual to monthly).
Note you have to have a SAS/ETS license to utilize proc expand.
thank you so much, will look at it and try it.
Thank you for advising me about proc expand;
so I did the following syntax:
proc expand data= yearly out monthly
from =year to= month;
ID Year
convert count;
run;
But I got the following log message:
ERROR: "Duplicate time interval found at observation number 2 in the data. The current ID is
Year=1999 and the previous 1998, which are within the same YEAR interval.
Check that FROM=YEAR is correct for this data set, and that the ID variable Year
contains SAS date or datetime values that correctly identify the observation"
I don't know how to solve this problem, as 1999 and 1998 are not the same year interva. If I remove the ID it will assume date from 1960 and gives jan1960 and so on.
I've seen someone else in this forum has the same problem but there was no accepted solution!
Does anybody know how to solve this problem?
Thanks
The options "from=year to=month" tells SAS to treat the statement
ID YEAR;
as declaring the variable YEAR to be a sas date-value (or date-time value). That is, a numeric value representing the number of days after 01JAN 1960 (or negative values for before). But your YEAR variable is just a 4-digit number. The number 1998 is interpreted (via the ID YEAR statement) as 21JUN1965 (1998 days after 01jan1960), and 1999 is interpreted as 22JUN1965. These are clearly not dates that are 1 year apart, so SAS objects.
Although you could modify PROC EXPAND to avoid treating YEAR as a date value, I'd recommending creation of a new varialbe with a date value (say MYDATE below) and running PROC EXPAND accordingly:
data need;
set yearly;
mydate=mdy(12,31,year);
format mydate date9.;
run;
proc expand data= need out=monthly from=year to=month align=end;
ID mydate; format mydate date9.;
convert count / observed=(end,end);
run
The interpolated data is, by default, generated as a SPLINE, i.e. it's a smooth curve from your earliest to latest times. If instead you want a series of connected straight lines (i.e. a straight line from 12/31/1998 to 12/31/1999, then from 12/1999 to 12/2000, etc.) you can use METHOD=JOIN (you can look up where to put it).
Also note I've assume your YEAR values are end of year, so MYDATE=mdy(12,31,year). But if you're using fiscal years ending in (say) June, you can set MYDATE=mdy(6,30,year).
Thank you so much for your help in this, the solution worked very well, and that helped me a lot.
Thanks again,
Lamia
This works for me. I'm guessing you likely didn't convert your date to a SAS date. But SAS dates are numbers, so it's using 1998 etc as a SAS date which converts to consecutive dates. It may be worth reviewing how SAS stores and deals with dates.
Good Luck!
data have;
input year both males females;
date_year = mdy(7, 1, year);
format date_year year4.;
cards;
1998 1410457 601845 808612
1999 1433745 613800 819945
2000 1458056 626315 831741
2001 1485165 640135 845030
2002 1516288 654292 861996
2003 1547090 668086 879004
2004 1580128 682483 897645
2005 1610582 696325 914257
2006 1649102 714278 934824
2007 1687468 733660 953808
2008 1732977 756747 976230
2009 1780489 780413 1000076
2010 1829854 804896 1024958
2011 1887092 833523 1053569
2012 1974102 876390 1097712
2013 2061041 918842 1142199
2014 2139666 957063 1182603
2015 2214067 993071 1220996
2016 2289808 1029975 1259833
;
run;
proc expand data= have out=monthly from=year to=month align=end;
ID date_year; format date_year date9.;
convert both=monthly_both / observed=(end,end);
convert females=monthly_female / observed=(end,end);
convert males=monthly_male / observed=(end,end);
run;quit;
Hello,
My apology for late reply. thank you so much for your effort, but you didn't have to do that, since there was a previous reply by another user suggesting the proc expand and the date correction and I alreayd followed that and it was fine. Thank you again for your efforts, and for giving the solution.
Best Regards,
Lamia
Can you attach your data set so we can test this? It is public info.
Hi,
My apology, I can't attach the data at all, for privacy purposes. the data are secured and locked.
Thanks for all your help,
Lamia
This is the census data you were asking about earlier? That's public, open data that can be released.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.