BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SoftDrink
Calcite | Level 5

Hello guys,

to get right to the point, here's my problem:

I have a dataset with hundreds of cases of people afftected by a certain disease, this dataset is simply called 'cases'. Every observation/person within this dataset has variables like zip code ('zip') and the date of diagnosis ('diagdate'), which defines when a person (observation) fell ill - VERY IMPORTANT FOR MY TASK.

These dates of diagnosis range from early 2006 till today. So every observation/case in this dataset IS sick, it's only a matter of time, WHEN they became sick!

What I wanted to do then was to count the number of cases/people who fell sick "for the first time" (they stay sick of course) for every SINGLE year within the time period from the beginning of 2009 till the end of 2013.

Why do I want to do that?

--> To see if there are any trends over time from 2009 to 2013, if the number of sick people increases/decreases or doesn't change year by year. Of course, it gets more complicated, but for now I only need the numbers for every year.

(E.g., 123 cases in 2009, 145 cases in 2010, 131 cases in 2011, 172 cases in 2012...hypotheticalyl)

I wanted to accomplish that by using a Macro that loops through every year of this time interval. Part of this Macro is an SQL query that simply counts the observations that fell sick within this certain year (first 2009, then 2010, 2011 and so on).

This again done by counting the variable 'zip' (no missing values, so it works for receiving a valid number) where an observation's date of diagnosis is between the 1st January of this year (e.g. 1st January 2009) and the 31st December of the same year (31st December 2009, respectively).

And the same for the next year, and the year after and so on.

And then print these numbers. This Macro is part of a more complicated program which is working without any errors so far. However...the counting macro doesn't work. I'm pretty sure there are several mistakes (I read of this %sysfunc thing...but I have NO clue how it really works...) and I JUST realized another one, but I just can't put it together correctly right now. I'd love to get some ideas/help how to fix this or even find a simpler/better solution for this problem...

Here's the code:

*I created these macro variables way earlier in the code to get rid of all the other observations outside of this time interval 2009-2013 by creating a dataset only with those observations whose date of diagnosis

was between 'startdate' and 'enddate' - I can't get rid of these global macro variables;

%let startdate = 01jan2009;

%let enddate = 31dec2013;

*Number of loops;

%let iterations = round((("&enddate"d - "&startdate"d)/365.25), 1);

*Initialize new date variables four counting;

%let lowerdate = &startdate; = 1st January 2009

%let upperdate = intnx('month', "&lowerdate"d, 11, 'end'); = 31st December 2009

%macro cases_per_year;

%do i = 1 %to &iterations;

    proc sql noprint;

    select count(zip)

    into :nr_obs

    from cases

    where (diagdate GE "&lowerdate"d AND diagdate LE "&upperdate"d);

    run;

    quit;

    *Increment both the lower and the upper boundary of the time interval by one year for the next iteration to count the numbers of the following year and so on;

    &lowerdate = intnx('year', "&lowerdate"d, 1, 'beginning'); = 1st January 2010

    &upperdate = intnx('year', "&upperdate"d, 1, 'end'); = 31st December 2010


    

%put &nr_obs;

%end;

%mend cases_per_year;

%cases_per_year;

When I run the code, everything works flawlessly, the creation of the cases dataset, some other calculations I've done before on these data etc... only in the end SAS is complaining about the macro:

ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand

       is required. The condition was: &iterations

ERROR: The %TO value of the %DO I loop is invalid.

ERROR: The macro CASES_PER_YEAR will stop executing.

ANY HELP is greatly appreciated! This thing makes me go crazy!

Post changed 12/29/2014 - 10.22pm

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

I'm assuming you have an ID variable for each case

data case_yearly;

set cases;

do year=year(diagdate) to year(&enddate.);

output;

end;

keep year zip id;

run;

Then run a proc freq on the output dataset.

Regarding your code:

You can't use functions in macro code, except with the use of %sysfunc. This is so you can explicitly tell the macro/SAS what is text and what is a function.

Also, here's how your macro would work, if you still wanted to go that route. I commented out the proc sql section as I didn't have data to test with.

%let startdate = 01jan2009;

%let enddate = 31dec2013;

*Number of loops;

%let iterations = %sysfunc(intck(year, "&startdate."d, "&enddate."d));

proc means data=sashelp.class; run;

*Initialize new date variables four counting;

%let lowerdate = &startdate;

%let upperdate = %sysfunc(intnx(month, "&lowerdate"d, 11, end), date9.);

%put &lowerdate.;

%put &upperdate.;

%put &iterations;

%macro cases_per_year;

%do i = 1 %to &iterations;

/*    proc sql noprint;*/

/*    select count(zip)*/

/*    into :nr_obs*/

/*    from cases*/

/*    where (diagdate GE "&lowerdate"d AND diagdate LE "&upperdate"d);*/

/*    run;*/

/*    quit;*/

    *Increment both the lower and the upper boundary of the time interval by one year for the next iteration to count the numbers of the following year and so on;

    %let lowerdate = %sysfunc( intnx(year, "&lowerdate"d, 1, beginning), date9.);

    %let upperdate = %sysfunc(intnx(year, "&upperdate"d, 1, end), date9.);

  

%put &lowerdate.;

%put &upperdate.;

%end;

%mend cases_per_year;

%cases_per_year;

View solution in original post

11 REPLIES 11
Reeza
Super User

You don't need a macro for this. Once a person has a disease they have it forever or is there an end date?

jakarman
Barite | Level 11

What is the question?

- number of person in each year (new cases and cured/ended-out of registration). this requires a step for unique ditstinct of every person. 

- the date can be formatted for you or anay period to report on. By that you can use one of the many reporting/displaying options (procs).

I cannot see how the code you have started with could be helpful.

Just restart the formulating the wanted (not too detailed) result and the dat you are having as the origin to start with.

---->-- ja karman --<-----
SoftDrink
Calcite | Level 5

Thanks for your reply! I edited my initial posting...maybe it's easier to understand now, what I want to do and what I have. The original dataset is HUGE, lots of other variables...most importantly, it's real patient data, so I can't show/upload it here without editing it/dropping other variables, which is what I might do, taking a screenshot or so...so you can better see for yourself.

Reeza
Super User

You can run a proc freq on your data by applying a format to your diagnosis date.

How big is 'big'?

Proc freq data=cases;

table diagdate/out=yearly_diagnosis;

format diagdate year4.;

run;

SoftDrink
Calcite | Level 5

Thank you, too, for your ideas! Smiley Happy Especially the one with your edited code sample seems to be helpful. However, I just took a quick glimpse at it...I'm done for today. I will dig into that tomorrow and let you know how far I got...for sure.
And 'big'...well, for my work as a student it's big. ~ 1100 observations, ~ 250 variables (most of them not necessary for me tbh, so I'll drop them anyway).

Patrick
Opal | Level 21

I understand that you can't provide the real data but try to mock-up some representative sample data (a data step creating such data) and then explain us how the expected result should look like. Without such sample data you can see that people answering your questions need to make assumptions like you having patient id's in your data and the like. It's much better and easier for us if you provide some data - and it's better for you as then you will get answers and code which are actually fit for your real data.

Not sure if below sample data is representative for what you have. May be the code will still be helpful for you. Assuming that you're rather new to SAS coding I've tried to only use simple data step processing to get to the result.

data have;

  infile datalines truncover;

  input patient_id $ zip_code $ date:date9.;

  format date date9.;

  datalines;

1 1234 01jun2011

1 1234 01jan2012

1 1234 15dec2011

1 1234 10may2013

2 1234 01feb2012

2 1234 15dec2011

3 5678 15dec2011

3 5678 10may2013

;

run;

data inter;

  set have;

  year=year(date);

run;

proc sort data=inter;

  by zip_code patient_id year;

run;

data want(keep=zip_code year count);

  set inter;

  by zip_code patient_id year;

  if first.patient_id then count+1;

  if last.zip_code then

    do;

      output;

      count=0;

    end;

run;

Reeza
Super User

I'm assuming you have an ID variable for each case

data case_yearly;

set cases;

do year=year(diagdate) to year(&enddate.);

output;

end;

keep year zip id;

run;

Then run a proc freq on the output dataset.

Regarding your code:

You can't use functions in macro code, except with the use of %sysfunc. This is so you can explicitly tell the macro/SAS what is text and what is a function.

Also, here's how your macro would work, if you still wanted to go that route. I commented out the proc sql section as I didn't have data to test with.

%let startdate = 01jan2009;

%let enddate = 31dec2013;

*Number of loops;

%let iterations = %sysfunc(intck(year, "&startdate."d, "&enddate."d));

proc means data=sashelp.class; run;

*Initialize new date variables four counting;

%let lowerdate = &startdate;

%let upperdate = %sysfunc(intnx(month, "&lowerdate"d, 11, end), date9.);

%put &lowerdate.;

%put &upperdate.;

%put &iterations;

%macro cases_per_year;

%do i = 1 %to &iterations;

/*    proc sql noprint;*/

/*    select count(zip)*/

/*    into :nr_obs*/

/*    from cases*/

/*    where (diagdate GE "&lowerdate"d AND diagdate LE "&upperdate"d);*/

/*    run;*/

/*    quit;*/

    *Increment both the lower and the upper boundary of the time interval by one year for the next iteration to count the numbers of the following year and so on;

    %let lowerdate = %sysfunc( intnx(year, "&lowerdate"d, 1, beginning), date9.);

    %let upperdate = %sysfunc(intnx(year, "&upperdate"d, 1, end), date9.);

  

%put &lowerdate.;

%put &upperdate.;

%end;

%mend cases_per_year;

%cases_per_year;

SoftDrink
Calcite | Level 5

Thanks again Reeza, your code example helped me fix the errors mine had! It's working now, I only adapted your changes, the PROC SQL section was already correct.
And thanks to you, Xia Keshan, too, your solution was definitely less complicated than mine and even lets me reuse the numbers easily since they are in a new dataset. Pretty useful. I wonder why I came up with the macro...


Still...I have another question regarding my macro approach, or rather several connected ones, just for my personal interest:


The thing is...I'm writing this program to be able to manually choose a variable number of years before I run the program, that's why I created the global macro variables 'startdate' and 'enddate'. The dataset with all the cases will be updated the next years, so there will be added more and more cases in the future. The program will then cut down the original complete dataset of cases to only the ones I actually want to have for calculations or counting their numbers, which works fine at the moment.

As you can see, I wanted to write the counted numbers of cases into a macro variable called 'nr_obs'.
How could I create a VARIABLE number of macro variables depending on the actual number of years in my time interval, whereas every macro variable ONLY contains the number of cases of ONE YEAR?
So, for example, in this case, the time interval is 5 years wide: 2009, 2010, 2011, 2012, 2013.
The macro 'cases_per_year' should then create 5 different macro variables, like nr_obs1 (contains #cases 2009), nr_obs2 (contains #cases 2010), ... and so on.
And on top of that, how could I 'transport' these macro variables out of the macro itself, maybe into a dataset or make them global, so I can reuse them later on...because right now, they're just local macro variables, right?

Phew...again, I demand a lot, but If I try to find this out on my own and look all this up, I'll spend hours and hours on it and probably still not get it to work I guess...whereas the things you told me were instantly clear to me.
So I would be really happy, if you maybe could expand Reeza's working code snippet and kind of show me how to do this.

And by the way, I created a sample case dataset and took a screenshot of it. Maybe this helps.

Posting edited - 12/30/2014 - 2.50pm


cases_example.JPG
Reeza
Super User

I strongly recommend against your macro approach. It's a sledgehammer for a staple. 

You can filter your years using either the proc freq or proc SQL method with a WHERE clause. This would also give you your dataset without the need to store your macro variables somewhere. Not to mention speed and clarity when editing your code for the future.

proc sql;

create table yearly_diagnosis1 as

    select  year(diagdate) as year,count(zip) as count

    from cases

  where calculated year between year("&startdate"d) and year("&enddate"d)

    group by calculated year;

    quit;

OR

Proc freq data=cases noprint;

  where calculated year between year("&startdate"d) and year("&enddate"d);

table diagdate*zip/out=yearly_diagnosis2;

format diagdate year4.;

run;

To your question of how to store macro variables and append them, again a sledgehammer approach. I'd create a data set each loop and append it to the final data set and not go into macro variables at all.


How would you know that nr_obs1 was aligned with year 2009? You'd have to do a calculation on the date to obtain that.


If you really wanted you could do something like the following at the end of your loop, but  before you exit the macro:



data want;

%do i = 1 %to &iterations;

year=year("&startdate"d)+&i-1;

Cases=&&&nr_obs&i.;

output;

%end;

run;

jakarman
Barite | Level 11

A nice small dataset not trouble with computer resources (that is what I care for when it is big).

So far I understand: 

- first step reduce dataset to only records having that first notion of becoming ill

  (the lowest day date for each person)  Do you have an id for each person?

- second step the reporting eg The proc freq on that reduced dataset. 
(done today too, samples when needed to be continued)

---->-- ja karman --<-----
Ksharp
Super User

If I understood what you mean, Why not simply make a year variable and count obs by it ?

proc sql;

    select  year(diagdate) as year,count(zip) as count

    from cases

    group by calculated year;

    quit;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 3202 views
  • 10 likes
  • 5 in conversation