Solved: How to add subset data step by step

lichee · Posted 08-31-2023 12:21 AM

Hi all,

I wanted to examine data with subset cohort included step by step. For example, there are five cohorts, A-E, and I wanted to run regression with the first cohort, and then I would run regression with an additional cohort each time. How can I conduct this using MACRO or iteration efficiently?

proc reg data=mydata;

model dependent=independents;

where cohort in ('A');

run;

proc reg data=mydata;

model dependent=independents;

where cohort in ('A','B');

run;

proc reg data=mydata;

model dependent=independents;

where cohort in ('A','B','C');

run;

proc reg data=mydata;

model dependent=independents;

where cohort in ('A','B','C','D');

run;

proc reg data=mydata;

model dependent=independents;

where cohort in ('A','B','C','D','E');

run;

I would think I can probably start with creating the list like below:

proc sql noprint;
select distinct quote(cohort) into :st_list separated by ','
from mydata;
quit;

Astounding · Posted 08-31-2023 11:47 AM

OK, here's a bit more of what a macro could look like. All of this would follow the code you originally posted. It assumes you remove the comma and use a blank as the "separated by" character.

%macro regloop;
   %local n subset;
   %do n=1 to &sqlobs;
      %let subset = &subset %scan(&stlist, &n);
      proc reg data=mydata;
        model dependent = independents;
       where cohort in (&subset);
      run;
   %end;
%mend region;

%regloop

This will generate 5 PROC REGs, just like you have. But the cohort values (and the actual number of PROC REGs) will depend on the data set you are processing.

View solution in original post

Astounding · Posted 08-31-2023 01:51 AM

As part of your plan, note that the IN operator does not require commas. You could shoot for (for example):

where cohort in ('A' 'B' 'C')

That will simplify subsequent programming.

Also note, SQL automatically creates &sqlobs which would be 5 in this case (number of items extracted). So your program could continue by writing a macro along these lines:

%macro regloop;
%local n;
%do n=1 %to &sqlobs;

Then have the macro generate PROC REG, using the first &n items from &stlist.

Definitely needs some details worked out, but that should give you a way to approach the problem.

PaigeMiller · Posted 08-31-2023 07:57 AM

I assume that the code you showed works. What would a macro do that this code does not do? You mention "iteration", what would be iterating?

I also question the statistical methodology here about adding cohort after cohort into the data in sequential fashion like you are doing. I would imagine a better way to see the impact of cohort is to add it into the model so that you can have different intercepts and/or different slopes for the different cohorts.

--
Paige Miller

lichee · Posted 08-31-2023 08:58 AM

Thank you both! I'm testing out my code, but cannot make it work. Just not good at iteration, especially when it's in macro.

PaigeMiller · Posted 08-31-2023 11:00 AM

@lichee wrote:
I'm testing out my code, but cannot make it work. Just not good at iteration, especially when it's in macro.

Its still not clear to me what "iteration" you mean, where in this problem is there "iteration"? Please describe in detail.

--
Paige Miller

lichee · Posted 08-31-2023 11:22 AM

I'd like to see how to add cohorts one by one given the list of cohorts. Thanks!

lichee · Posted 08-31-2023 10:56 AM

Thank you Paige! I know it's statistically not correct way to add cohort after cohort into regression analysis. But I'm just giving a quick and simple example.

Astounding · Posted 08-31-2023 11:47 AM

OK, here's a bit more of what a macro could look like. All of this would follow the code you originally posted. It assumes you remove the comma and use a blank as the "separated by" character.

%macro regloop;
   %local n subset;
   %do n=1 to &sqlobs;
      %let subset = &subset %scan(&stlist, &n);
      proc reg data=mydata;
        model dependent = independents;
       where cohort in (&subset);
      run;
   %end;
%mend region;

%regloop

This will generate 5 PROC REGs, just like you have. But the cohort values (and the actual number of PROC REGs) will depend on the data set you are processing.

How to add subset data step by step

Re: How to add subset data step by step

Re: How to add subset data step by step

Re: How to add subset data step by step

Re: How to add subset data step by step

Re: How to add subset data step by step

Re: How to add subset data step by step

Re: How to add subset data step by step

Re: How to add subset data step by step

Classroom Training Available!