BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lamiaH
Obsidian | Level 7

Hello, 

 

I want to make data that have a date variable and variable for  counts for population from 1998 till 2016 by month/year count. I have census data for 5 years (1996, 2001, 2006, 2011, 2016) and I have the count of population for these years, now I want to calculat the population between these years (intercensal estimates) by month/year using a formula below, but I got not enough ifn arguments an dI know that is true. what other method, I can use. Please advise me, I've been stuck with this for 2 weeks now and I need to move on with the rest of analysis. 

 

r=log (end datecount/start date count)5;

then popcount= exp(log (start years)+r*n / where n= the period from the start date so in 1998 it will be the period from 1996 and so on. 

I know it is complicated but I tried a code as the following:

data int_census;

do year= 1998 to 2016;

do month 1 to 12;

*** Establish population at start of interval.   with population estimate in 1996 (etc) ***;

        p_start = ifn(year<2001, 1334110

                        ifn(year<2006,1472115,

                        ifn(year<2011,xxx06,xxx11))); *so on for the rest of the years**

*** Establish population at end of interval.  Substitute xxx01 (etc) with population estimate in 2001 (etc) ***;

        p_end = ifn(year<2001,xxx01,

                        ifn(year<2006,xxx06,

                        ifn(year<2011,xxx11,xxx16)));

*** Calculate r ***;

        r= ln(p_end/p_start)/60;

*** Calculate the number of years between loop variable YEAR and the ‘base year’.  The MOD function returns the remainder of ARG1 divided by ARG2. ***;

        years = mod(year-1996,5);

*** Calculate intercensal estimate (p_interval) for month MONTH in year YEAR ***;

p_interval = exp(ln(p_start) + r*(years*12 + [month-1]);

output;

end;

end;

run;

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Take a look at CANSIM 051-0001 - you can build the age groups you want in the add/remove data, select your geography and build your estimates back to 1971 for some numbers. 

 

Depending on what province you're in, check your provincial statistical agency for more data on their page, including their population estimates and forecasts.

 

http://www5.statcan.gc.ca/cansim/a26?lang=eng&retrLang=eng&id=0510001&&pattern=&stByVal=1&p1=1&p2=37...

 

PS if you use R, there's a CANSIM package to automatically load data from a specific CANSIM table, sadly I haven't seen anything similar for SAS yet.

View solution in original post

17 REPLIES 17
Reeza
Super User

What country are you in? 

Usually the Stats agency (Statistics Canada, US Census Bureau) has intercensal estimates...especially for population.

 

 

lamiaH
Obsidian | Level 7

Hi,

 

I'm in Canada, I want it for certain age population, I will email them then and find out. 

 

thank you

 

Reeza
Super User

Take a look at CANSIM 051-0001 - you can build the age groups you want in the add/remove data, select your geography and build your estimates back to 1971 for some numbers. 

 

Depending on what province you're in, check your provincial statistical agency for more data on their page, including their population estimates and forecasts.

 

http://www5.statcan.gc.ca/cansim/a26?lang=eng&retrLang=eng&id=0510001&&pattern=&stByVal=1&p1=1&p2=37...

 

PS if you use R, there's a CANSIM package to automatically load data from a specific CANSIM table, sadly I haven't seen anything similar for SAS yet.

lamiaH
Obsidian | Level 7

thank you so much for your reply. According to your advice, I've contacted Stats Canada and they provided the link and I have yearly data now. Now I want to creat monthly data for each year according to my equation using the estimates between each successive year. But, I've been trying to figure out how to do it through arrays and found it really hard. 

 

so my new data should look like below:

 

year | month  |monthly population estimate

 1998  1              1455666

1998   2               168889 

and so on

 

I have now the yearly estimates but need to do the monthly estimates using the yearly ones, still looking through arrays how to best to do it. Any ideas??

 

Thanks again for the advise. In regards to R, I'm just learning R recently. Have you ever heard of doing tables in R markdown through pipes and then knitting will produce the table as you specify it through knitting. 

So they use the | ............| and they put the commands of the numbers or calculations between pipes. I didn't find any materials in regardes to that, but they are using this method a lot in research.

 

Best

Lamia

 

 

mkeintz
PROC Star

You are asking to do exactly one of the purposes of sas PROC EXPAND, which is set to change the make esitmates to change the frequency of a time series (in your case from annual to monthly).

 

See proc expand documentation

 

Note you have to have a SAS/ETS license to utilize proc expand.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
lamiaH
Obsidian | Level 7

thank you so much, will look at it and try it. 

 

 

lamiaH
Obsidian | Level 7

Thank you for advising me about proc expand;

so I did the following syntax:

proc expand data= yearly out monthly

from =year to= month;

ID Year

convert count;

run;

But I got the following log message:

 

ERROR: "Duplicate time interval found at observation number 2 in the data. The current ID is

       Year=1999 and the previous 1998, which are within the same YEAR interval.

       Check that FROM=YEAR is correct for this data set, and that the ID variable Year

       contains SAS date or datetime values that correctly identify the observation"

 

I don't know how to solve this problem, as 1999 and 1998 are not the same year interva. If I remove the ID it will assume date from 1960 and gives jan1960 and so on.

I've seen someone else in this forum has the same problem but there was no accepted solution! 

Does anybody know how to solve this problem?

Thanks

 

mkeintz
PROC Star

The options "from=year to=month" tells SAS to treat the statement

   ID YEAR;

as declaring the variable YEAR to be a sas date-value (or date-time value).  That is, a numeric value representing the number of days after 01JAN 1960 (or negative values for before).  But your YEAR variable is just a 4-digit number.    The number 1998 is interpreted (via the ID YEAR statement) as 21JUN1965 (1998 days after 01jan1960), and 1999 is interpreted as 22JUN1965.  These are clearly not dates that are 1 year apart, so SAS objects.

 

Although you could modify PROC EXPAND to avoid treating YEAR as a date value,  I'd recommending creation of a new varialbe with a date value (say MYDATE below) and running PROC EXPAND accordingly:

 

data need;
  set yearly;
  mydate=mdy(12,31,year);
  format mydate date9.;
run;

proc expand data= need out=monthly  from=year to=month align=end;
  ID mydate; format mydate date9.;
  convert count / observed=(end,end);
run

 

 

The interpolated data is, by default, generated as a SPLINE, i.e. it's a smooth curve from your earliest to latest times.  If instead you want a series of connected straight lines (i.e. a straight line from 12/31/1998 to 12/31/1999, then from 12/1999 to 12/2000, etc.) you can use METHOD=JOIN  (you can look up where to put it).

 

Also note I've assume your YEAR values are end of year, so MYDATE=mdy(12,31,year).  But if you're using fiscal years ending in (say) June, you can set MYDATE=mdy(6,30,year).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
lamiaH
Obsidian | Level 7

Thank you so much for your help in this, the solution worked very well, and that helped me a lot. 

 

Thanks again, 

Lamia

Reeza
Super User

This works for me. I'm guessing you likely didn't convert your date to a SAS date. But SAS dates are numbers, so it's using 1998 etc as a SAS date which converts to consecutive dates. It may be worth reviewing how SAS stores and deals with dates. 

Good Luck!

 

data have;
input year both males females;
date_year = mdy(7, 1, year);
format date_year year4.;
cards;
1998	1410457	601845	808612
1999	1433745	613800	819945
2000	1458056	626315	831741
2001	1485165	640135	845030
2002	1516288	654292	861996
2003	1547090	668086	879004
2004	1580128	682483	897645
2005	1610582	696325	914257
2006	1649102	714278	934824
2007	1687468	733660	953808
2008	1732977	756747	976230
2009	1780489	780413	1000076
2010	1829854	804896	1024958
2011	1887092	833523	1053569
2012	1974102	876390	1097712
2013	2061041	918842	1142199
2014	2139666	957063	1182603
2015	2214067	993071	1220996
2016	2289808	1029975	1259833
;
run;

proc expand data= have out=monthly  from=year to=month align=end;
  ID date_year; format date_year date9.;
  convert both=monthly_both / observed=(end,end);
  convert females=monthly_female / observed=(end,end);
  convert males=monthly_male / observed=(end,end);
run;quit;
lamiaH
Obsidian | Level 7

Hello, 

 

My apology for late reply. thank you so much for your effort, but you didn't have to do that, since there was a previous reply by another user suggesting the proc expand and the date correction and I alreayd followed that and it was fine. Thank you again for your efforts, and for giving the solution. 

 

Best Regards, 

 

Lamia

Reeza
Super User

Can you attach your data set so we can test this? It is public info.

lamiaH
Obsidian | Level 7

Hi, 

My apology, I can't attach the data at all, for privacy purposes. the data are secured and locked. 

 

Thanks for all your help,

Lamia

Reeza
Super User

This is the census data you were asking about earlier? That's public, open data that can be released.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1102 views
  • 3 likes
  • 3 in conversation