Help using Base SAS procedures

Datastepping - beyond my skills..

Accepted Solution Solved
Reply
Contributor
Posts: 43
Accepted Solution

Datastepping - beyond my skills..

Hi everyone!

I have a issue that i hope you can help me to resolve. I am trying to map out a newspapers

I have a large dataset. Over 70000 rows with approx 57 variables. The test dataset attached is a sample of 10000 randomly created, but with realistic outcomes.

This is the output that I get, and then work with in Excel. My assignment is to map a flow between these different categories. An example could be ID 4835:

Between weeks 1 and 2, hence change 2, he opened to a four week trial and therefore has the “OPENED to TRIAL_4W”. This trial runs for 4 weeks, and from week 5 to week 6, he closes and gets a “CLOSED from TRIAL_4W” mark. Seven weeks later, he opens to a 13 week trial period, and so on…

Now, what I would like SAS to do, is what I do manually in excel with pivottabels and its driving me up the wall..:

Let’s say I want to check how many of the four week trials that opened in January (change2, change3, change4), are still active 8 weeks later and in what category they are active. Then I would filter change2 to “OPENED to TRIAL_4W” and copy change8 (change2 + 8 weeks = change10), and then do a pivot table to count how they are distributed. I would repeat this procedure for change3 to change11, change4 to change12, then add up the 3 weeks (3 weeks because I don’t have 2011 data), and THEN I would have January.

As you might guess, this is a very tedious thing to do, and my errors keep on growing when doing manual excel slavework like this, simply because I have no idea how to do it in SAS.

In a perfect world, I would like a piece of SAS-code that:

  • Looks at month Y tells me how many opened to different subscriptions.
  • Conditioned on opening in month Y, how are they distributed over different subscriptions X weeks later?
  • How many weeks they had a “blank” before opening to a new subscription?

If this is possible, I think that I would be able to build something that answered the rest of the questions I have. The only problem is, I have no idea where to start.

The code that generates the output is this:

data c;

set b;

retain count;

by ID;

if first.ID then count = 0;

count = count + 1;

if numabb = " " then advarsel = 1;

run;quit;

data rotate;

set c;

by ID;

retain numabb1-numabb16;

array Anumabb(1:16) numabb1-numabb16;

if first.ID then do;

do i = 1 to 16;

      Anumabb = .;

end;

end;

Anumabb(week) = numabb;

if last.ID then output;

drop i;

run;quit;

data rotate_1;

set rotate;

array Anumabb(16) numabb1-numabb16;

do i = 1 to 16;

if Anumabb = ' ' then Anumabb = 0;

end;

drop i;

run;quit;

data pattern;

set rotate_1;

array Anumabb(16) numabb1-numabb16;

array AChange(2:16) $ 45 change2-change16;

do w = 2 to 16;


The sampledata is attached. I sincerely hope you can help!


- Toby

Attachment

Accepted Solutions
Solution
‎04-29-2012 09:38 AM
Super User
Posts: 5,092

Re: Datastepping - beyond my skills..

T,

I think I have the right idea here, and that it will be easy enough for you to adjust if not.  This approach saves all the formatting for later on, but finds the initial promotional status, and the initial subsequent subscription status.  It works with your final data set that already has transposed the data, but just as easily could be redone to work with the earlier format to the data with multiple records per ID.

You may need to adjust some of the ranges I used, like 16 to 20 = trial subscription.  But you're most familiar with what the codes mean and which should be grouped.

data want;

   set have;

   array anumabb {16} numabb1 - numabb16;

   do week=1 to 16 until (week_started_trial > .);

        if (16 <= anumabb{week} <= 20) then do;

           week_started_trial = week;

           starting_trial_status = anumabb{week};

       end;

   end;

   * May want to set a default, if there was never a trial;

   if (0 < week_started_trial < 16) then

   do week=week_started_trial+1 to 16 until (week_started_subscription > .);

        if (1 <= anumabb{week} <= 15) then do;

           week_started_subscription = week;

           starting_subscription_status = anumabb{week};

       end;

   end;

   keep id week_started_trial week_started_subscription starting_trial_status starting_subscription_status;

run;

There are lots of ways to format a table from this point.  First, let's see if this is pulling out the information you need (or at least most of it).

View solution in original post


All Replies
Super User
Posts: 5,092

Re: Datastepping - beyond my skills..

Toby,

Here is some high-level guidance.  I'm hoping you can work out the specifics.

The data set PATTERN contains all the necessary elements.  Even the arrays that might facilitate programming had been defined.  (Note that array definitions are temporary, so the ARRAY statements will need to appear in any new programs you write.)

Your mission will be to decide on the outcomes you would like to measure, and set up a new variable for each outcome.  (New variables representing outcomes will probably take on values of 0/1, but that is not a requirement if something else works better.)  Use a DATA step to add new variables to each row, measuring the outcomes for just that row.  When you're done, use another tool (such as PROC FREQ) on the resulting data set to count how often each outcome occurs.

As a general rule, if you could look at the data on paper and figure out what to do, there will be a way to tell SAS how to do it.  I hope this gives you enough guidance to work out the SAS-based approach.

Good luck.

Respected Advisor
Posts: 4,655

Re: Datastepping - beyond my skills..

Toby, I, and I assume many others like me, will not open an .XLS file found on a forum (even a reputable one such as this one) for well known security reasons. I would feel safer opening a text-only file such as .csv.

PG

PG
Super User
Posts: 9,687

Re: Datastepping - beyond my skills..

Here is a piece of code , based on my understanding .

proc import datafile='c:\sampledata.xls' out=have dbms=excel replace;

getnames=yes;run;

data January ;

set have;

where (change2 ="OPENED to TRIAL_4W" and change10 is not missing ) or

       (change3 ="OPENED to TRIAL_4W" and change11 is not missing ) or

       (change4 ="OPENED to TRIAL_4W" and change12 is not missing ) ;

run;

Ksharp

Contributor
Posts: 43

Re: Datastepping - beyond my skills..

Hi, and thanks for the anwsers.

@ Astounding - if i were to use a dummy method, the number of combinations over time would overwhelm me, and i would end up with ALOT of dummys for each time period. Down below is the code i use to categorize the different subscribtion directions.

@ PGStats, noted. Thanks. I've attached the sampledata as a .csv file! The SAS forums zips it for some reason, I hope it's OK?

@ Ksharp - Thanks for your input. That would probably work, but that would only give me a partial anwser to question two, and only for one month. I know that i can just take february data and do it the same way, but if you open the sampledata, you'll realize that the number of different combinations are quite large.

Here's the code i use for categorizing the diffrent subscribtions. This covers ALL of the possible movement, and as you can see, there are quite a few possible combinations.

data pattern;

set rotate_1;

array Anumabb(16) numabb1-numabb16;

array AChange(2:16) $ 45 change2-change16;

do w = 2 to 16;

*if Anumabb[w-1] > 0 and Anumabb[w-1] = Anumabb then AChange = 'No change';

******************************************* BEVÆGELSER ****************************************;

* NORMAL;

if Anumabb[w-1] = 1 and Anumabb = 1 then AChange = 'NORMAL_AAR to NORMAL_AAR';

if Anumabb[w-1] = 2 and Anumabb = 2 then AChange = 'NORMAL_HALVT to NORMAL_HALVT';

if Anumabb[w-1] = 3 and Anumabb = 3 then AChange = 'NORMAL_KVARTAL to NORMAL_KVARTAL';

if Anumabb[w-1] = 4 and Anumabb = 4 then AChange = 'NORMAL_MAANED to NORMAL_MAANED';

if Anumabb[w-1] = 5 and Anumabb = 5 then AChange = 'NORMAL_10W to NORMAL_10W';

if Anumabb[w-1] = 6 and Anumabb = 6 then AChange = 'NORMAL_13W to NORMAL_13W';

if Anumabb[w-1] = 7 and Anumabb = 7 then AChange = 'NORMAL_26W to NORMAL_26W';

if Anumabb[w-1] = 8 and Anumabb = 8 then AChange = 'NORMAL_52W to NORMAL_52W';

* KAMPAGNE;

if Anumabb[w-1] = 9 and Anumabb = 9 then AChange = 'CAMPAIGN_R_HAVLT to CAMPAIGN_R_HAVLT';

if Anumabb[w-1] = 10 and Anumabb = 10 then AChange = 'CAMPAIGN_R_KVARTAL to CAMPAIGN_R_KVARTAL';

if Anumabb[w-1] = 11 and Anumabb = 11 then AChange = 'CAMPAIGN_4W to CAMPAIGN_4W';

if Anumabb[w-1] = 12 and Anumabb = 12 then AChange = 'CAMPAIGN_10W to CAMPAIGN_10W';

if Anumabb[w-1] = 13 and Anumabb = 13 then AChange = 'CAMPAIGN_13W to CAMPAIGN_13W';

if Anumabb[w-1] = 14 and Anumabb = 14 then AChange = 'CAMPAIGN_26W to CAMPAIGN_26W';

* TRIAL;

if Anumabb[w-1] = 15 and Anumabb = 15 then AChange = 'TRIAL_4W to TRIAL_4W';

if Anumabb[w-1] = 16 and Anumabb = 16 then AChange = 'TRIAL_10W to TRIAL_10W';

if Anumabb[w-1] = 17 and Anumabb = 17 then AChange = 'TRIAL_13W to TRIAL_13W';

if Anumabb[w-1] = 18 and Anumabb = 18 then AChange = 'TRIAL_26W to TRIAL_26W';

* FREE;

if Anumabb[w-1] = 19 and Anumabb = 19 then AChange = 'FREE_05 to FREE_05';

if Anumabb[w-1] = 20 and Anumabb = 20 then AChange = 'FREE_06 to FREE_06';

* MIDLERTIDIGE LUKNINGER;

if Anumabb[w-1] = 21 and Anumabb = 21 then AChange = 'MIDL to MIDL';

********************************* ÅBENEDE *************************************;

* NORMAL;

if Anumabb[w-1] = 0 and Anumabb = 1 then AChange = 'OPENED to NORMAL_AAR';

if Anumabb[w-1] = 0 and Anumabb = 2 then AChange = 'OPENED to NORMAL_HALVT';

if Anumabb[w-1] = 0 and Anumabb = 3 then AChange = 'OPENED to NORMAL_KVARTAL';

if Anumabb[w-1] = 0 and Anumabb = 4 then AChange = 'OPENED to NORMAL_MAANED';

if Anumabb[w-1] = 0 and Anumabb = 5 then AChange = 'OPENED to NORMAL_10W';

if Anumabb[w-1] = 0 and Anumabb = 6 then AChange = 'OPENED to NORMAL_13W';

if Anumabb[w-1] = 0 and Anumabb = 7 then AChange = 'OPENED to NORMAL_26W';

if Anumabb[w-1] = 0 and Anumabb = 8 then AChange = 'OPENED to NORMAL_52W';

* KAMPAGNE;

if Anumabb[w-1] = 0 and Anumabb = 9 then AChange = 'OPENED to CAMPAIGN_R_HAVLT';

if Anumabb[w-1] = 0 and Anumabb = 10 then AChange = 'OPENED to CAMPAIGN_R_KVARTAL';

if Anumabb[w-1] = 0 and Anumabb = 11 then AChange = 'OPENED to CAMPAIGN_4W';

if Anumabb[w-1] = 0 and Anumabb = 12 then AChange = 'OPENED to CAMPAIGN_10W';

if Anumabb[w-1] = 0 and Anumabb = 13 then AChange = 'OPENED to CAMPAIGN_13W';

if Anumabb[w-1] = 0 and Anumabb = 14 then AChange = 'OPENED to CAMPAIGN_26W';

* TRIAL;

if Anumabb[w-1] = 0 and Anumabb = 15 then AChange = 'OPENED to TRIAL_4W';

if Anumabb[w-1] = 0 and Anumabb = 16 then AChange = 'OPENED to TRIAL_10W';

if Anumabb[w-1] = 0 and Anumabb = 17 then AChange = 'OPENED to TRIAL_13W';

if Anumabb[w-1] = 0 and Anumabb = 18 then AChange = 'OPENED to TRIAL_26W';

* FREE;

if Anumabb[w-1] = 0 and Anumabb = 19 then AChange = 'OPENED to FREE_05';

if Anumabb[w-1] = 0 and Anumabb = 20 then AChange = 'OPENED to FREE_06';

* MIDLERTIDIGE LUKNINGER;

if Anumabb[w-1] = 0 and Anumabb = 21 then AChange = 'OPENED to MIDL';

********************************* LUKKEDE *************************************;

* NORMAL;

if Anumabb[w-1] = 1 and Anumabb = 0 then AChange = 'CLOSED from NORMAL_AAR';

if Anumabb[w-1] = 2 and Anumabb = 0 then AChange = 'CLOSED from NORMAL_HALVT';

if Anumabb[w-1] = 3 and Anumabb = 0 then AChange = 'CLOSED from NORMAL_KVARTAL';

if Anumabb[w-1] = 4 and Anumabb = 0 then AChange = 'CLOSED from NORMAL_MAANED';

if Anumabb[w-1] = 5 and Anumabb = 0 then AChange = 'CLOSED from NORMAL_10W';

if Anumabb[w-1] = 6 and Anumabb = 0 then AChange = 'CLOSED from NORMAL_13W';

if Anumabb[w-1] = 7 and Anumabb = 0 then AChange = 'CLOSED from NORMAL_26W';

if Anumabb[w-1] = 8 and Anumabb = 0 then AChange = 'CLOSED from NORMAL_52W';

* KAMPAGNE;

if Anumabb[w-1] = 9 and Anumabb = 0 then AChange = 'CLOSED from CAMPAIGN_R_HAVLT';

if Anumabb[w-1] = 10 and Anumabb = 0 then AChange = 'CLOSED from CAMPAIGN_R_KVARTAL';

if Anumabb[w-1] = 11 and Anumabb = 0 then AChange = 'CLOSED from CAMPAIGN_4W';

if Anumabb[w-1] = 12 and Anumabb = 0 then AChange = 'CLOSED from CAMPAIGN_10W';

if Anumabb[w-1] = 13 and Anumabb = 0 then AChange = 'CLOSED from CAMPAIGN_13W';

if Anumabb[w-1] = 14 and Anumabb = 0 then AChange = 'CLOSED from CAMPAIGN_26W';

* TRIAL;

if Anumabb[w-1] = 15 and Anumabb = 0 then AChange = 'CLOSED from TRIAL_4W';

if Anumabb[w-1] = 16 and Anumabb = 0 then AChange = 'CLOSED from TRIAL_10W';

if Anumabb[w-1] = 17 and Anumabb = 0 then AChange = 'CLOSED from TRIAL_13W';

if Anumabb[w-1] = 18 and Anumabb = 0 then AChange = 'CLOSED from TRIAL_26W';

* FREE;

if Anumabb[w-1] = 19 and Anumabb = 0 then AChange = 'CLOSED from FREE_05';

if Anumabb[w-1] = 20 and Anumabb = 0 then AChange = 'CLOSED from FREE_06';

* MIDLERTIDIGE LUKNINGER;

if Anumabb[w-1] = 21 and Anumabb = 0 then AChange = 'CLOSED from MIDL';

******************************************* KONVERTERING FRA NORMAL ****************************************;

* NORMAL TIL NORMAL;

if Anumabb[w-1] = 5 and 5 < Anumabb < 9 then AChange = 'NORMAL_10W to NORMAL_MAG';

* MIDLERTIDIGE LUKNINGER;

if 0 < Anumabb[w-1] < 9 and Anumabb = 21 then AChange = 'NORMAL to MIDL';

******************************************* KONVERTERING FRA KAMPAGNER ****************************************;

* KAMPAGNE TIL NORMAL;

if Anumabb[w-1] = 9 and 0 < Anumabb < 9 then AChange = 'CAMPAIGN_R_HAVLT to NORMAL';

if Anumabb[w-1] = 10 and 0 < Anumabb < 9 then AChange = 'CAMPAIGN_R_KVARTAL to NORMAL';

if Anumabb[w-1] = 11 and 0 < Anumabb < 9 then AChange = 'CAMPAIGN_4W to NORMAL';

if Anumabb[w-1] = 12 and 0 < Anumabb < 9 then AChange = 'CAMPAIGN_10W to NORMAL';

if Anumabb[w-1] = 13 and 0 < Anumabb < 9 then AChange = 'CAMPAIGN_13W to NORMAL';

if Anumabb[w-1] = 14 and 0 < Anumabb < 9 then AChange = 'CAMPAIGN_26W to NORMAL';

* TRIAL TIL NORMAL;

if Anumabb[w-1] = 15 and 0 < Anumabb < 9 then AChange = 'TRIAL_4W to NORMAL';

if Anumabb[w-1] = 16 and 0 < Anumabb < 9 then AChange = 'TRIAL_10W to NORMAL';

if Anumabb[w-1] = 17 and 0 < Anumabb < 9 then AChange = 'TRIAL_13W to NORMAL';

if Anumabb[w-1] = 18 and 0 < Anumabb < 9 then AChange = 'TRIAL_26W to NORMAL';

* MIDLERTIDIGE LUKNINGER;

if 8 < Anumabb[w-1] < 15 and Anumabb = 21 then AChange = 'CAMP to MIDL';

if 14 < Anumabb[w-1] < 19 and Anumabb = 21 then AChange = 'TRIAL to MIDL';

******************************************* KONVERTERING TIL KAMPAGNE ****************************************;

* HVERVE;

if Anumabb[w-1] = 19 and Anumabb = 16 then AChange = 'FREE_05 to TRIAL_10W';

if Anumabb[w-1] = 20 and Anumabb = 16 then AChange = 'FREE_06 to TRIAL_10W';

if Anumabb[w-1] = 19 and Anumabb = 12 then AChange = 'FREE_05 to CAMPAIGN_10W';

if Anumabb[w-1] = 20 and Anumabb = 12 then AChange = 'FREE_06 to CAMPAIGN_10W';

if Anumabb[w-1] = 19 and Anumabb = 21 then AChange = 'FREE_05 to MIDL';

if Anumabb[w-1] = 20 and Anumabb = 21 then AChange = 'FREE_06 to MIDL';

* 4 UGERS;

if Anumabb[w-1] = 11 and 8 < Anumabb < 10 then AChange = 'CAMPAIGN_4W to CAMPAIGN_R';

* 10 UGERS;

if Anumabb[w-1] = 16 and Anumabb = 5 then AChange = 'TRIAL_10W to NORMAL_10W';

if Anumabb[w-1] = 12 and Anumabb = 5 then AChange = 'CAMPAIGN_10W to NORMAL_10W';

end;

drop w;

run;quit;

Im not really looking for code to tell me how or where the subscribtions are at different times. I need some help to create some code, that can:

- Conditioned on one time period (month) how many opened, and X weeks later, how many of them are still active and how are they distributed over different subscribtions.

Basicly the same questions as in my original post.

Hope this clears things up a bit..

T

Super User
Posts: 5,092

Re: Datastepping - beyond my skills..

Toby,

Yes, that helps clear things up.  Now let's clear them up the rest of the way, I'll try to construct an example of what you are looking for.  If you comment about my assumptions and questions below, there are many posters here who would finish the job (myself included).

Starting with the data set created above, a macro would specify ...

One time period.  By taking on a number from 1 to 16, this would specify which variable (numabb1 - numabb16) should be used.

The number of weeks later.  In combination with the specified time period, this would tell you which variable (change2 - change16) should be used.  For example, if the time period is 5, and the number of weeks later is 4, use change9.

The final result in this case would be a table showing the distribution of change9.  But does the time period variable get used in other ways?  Added to the title?  Other than examining change9, are there other ways to determine which subscriptions are still active?  Should the final table utilize just observations and omit others?

Understanding the problem is more than half the battle in this case.

Regular Contributor
Posts: 241

Re: Datastepping - beyond my skills..

I would reshape the data to long and use a simple data step to extract relevant observations before tabulating. The useful coding pattern in the data step is the "Double-DoW" (search sas-l like this for some examples). Below I am assuming that you have imported your excel sheet into a dataset, work.one:

     
    /* reshape to long */
    data two;
      length id week numabb 8 change $40.;
      set one;
      change1 = repeat(" ", 40-1);
      array num[1:16] numabb1-numabb16;
      array chg[1:16] $ change1-change16;
      do week = 1 to 16;
        numabb = num[week];
        change = chg[week];
        output;
        keep id week numabb change;
      end;
    run;
   
    /* what happened to those with four week trials opened in weeks 1 to 4
        after 8 (or upto the available) weeks later? 
      if there are two or more trial_4w^s opened in the period, then
      we take the latest one. See id=7098 */
    data three;
      /* double DoW */
      opened = 0;
      do until (last.id);
        set two;
        by id;
        if (1<=week<=4 and change="OPENED to TRIAL_4W") then
          opened = week;
      end;
      do until (last.id);
        set two;
        by id;
        if opened>0 and week = opened + min(opened+8, 16) then
          output;
      end;
    run;
    proc freq data=three;
      tables change/list missing;
    run;
    /* on lst
    The FREQ Procedure                               
                                                      Cumulative Cumulative
    change                          Frequency Percent  Frequency  Percent
    ------------------------------------------------------------------------
                                          228  86.36        228    86.36 
    CAMPAIGN_10W to CAMPAIGN_10W            1   0.38        229    86.74 
    CAMPAIGN_13W to CAMPAIGN_13W            1   0.38        230    87.12 
    CLOSED from FREE_05                     1   0.38        231    87.50 
    CLOSED from TRIAL_13W                   1   0.38        232    87.88 
    FREE_05 to FREE_05                      1   0.38        233    88.26 
    MIDL to MIDL                            1   0.38        234    88.64 
    NORMAL_13W to NORMAL_13W               20   7.58        254    96.21 
    NORMAL_52W to NORMAL_52W                2   0.76        256    96.97 
    NORMAL_KVARTAL to NORMAL_KVARTAL        3   1.14        259    98.11 
    OPENED to CAMPAIGN_13W                  1   0.38        260    98.48 
    OPENED to NORMAL_13W                    1   0.38        261    98.86 
    TRIAL_13W to TRIAL_13W                  2   0.76        263    99.62 
    TRIAL_26W to TRIAL_26W                  1   0.38        264   100.00
    */

Valued Guide
Posts: 765

Re: Datastepping - beyond my skills..

hi ... I used your CSV file and hopefully I understand the counts that you want ...

* make a data set;

proc import datafile='z:\sampledata.csv' out=xxx;
delimiter=';';
run;

* get rid of all those formats/informats added by IMPORT;

proc datasets lib=work nolist;
modify xxx;
format _all_;
informat _all_;
quit;

* find START in January (change2-change4) ... if START in January, find STOP;

data jan;
set xxx;
array change(2:16) change2-change16;

* did they start in January;
start = whichc('OPENED to TRIAL_4W',change2,change3,change4) + 1;
if start gt 1;

* when did they stop;
stop = whichc('CLOSED from TRIAL_4W',of change(*)) + 1;
run;

output attached ... within each January week (START 2, 3, 4) ... if STOP = 1, no CLOSED was found in the CHANGE variables


january.png
Contributor
Posts: 43

Re: Datastepping - beyond my skills..

@ Astounding

* The timeperiod variable, week or month, is when the subscribtion is started. But since change2 describes the change between week one, and week two, it's really not nessesary.

* It dosen't get added to the title, that's not really nessesary.

* A active subscribtion is any cell that isen't blank, or has a "CLOSED from X" mark.

* Optimally, something like the result that chang_y_chung@hotmail.com gets, but for all combinations and running from month to month (alternativliv week to week). I'll need to look at the Double-DoW and how the reshapeing to long looks. But the output looks promising.

Basicly i need to find out how the flow in subscriptions is. So if X amount of people open to a 10 week campaign, how many of them open to a 13 week normal subscription anytime later.Saying that, having a fixed number of weeks that i need to look ahead (like from change4 to change9) is not a satisfactory solution.

I might have a bunch of TRIAL_4W that opens in change2, then distributed over mabye 6-7 weeks, they will open to other subscribtions. I need to figure out a way to calculate the average number of weeks they are closed (blank) before they open to a new subscribtion, and then what subscribtion they open to. And then again, how many weeks they, on average, hold that subscribtion.

@ chang_y_chung - Thanks a bunch! I think i understand the transformation to long. But i'll have to look at the Double-DoW method. The output you got is near perfect. But i still need to have a "dynamic" timerange, so that is isen't locked on to 4 weeks necessarily.

@ MikeZdeb - Thanks for the input!  Im not really sure that i fully understand your output? Im not really interested in how many subscribtions close, but rather how many of the ones who opened, are still active X weeks later.. I hope that my comments above clears up my point abit more.

@ Ksharp - Just saw your post. I need to read some more about the _a and dim(_a)-8, as i'm not really sure what that does. Again, as commented in this post, i would like to have a "dynamic" timelimit, so i don't have to decide on a fixed number of weeks, but rather "Who opened in week1, how are they distributed in the weeks that i have info about now (weeks 1 to 16), how long were they blank before they opened, and what did they open to?

Thanks alot for the input, it's really helpful. Finding help to SAS datastep on google, is a bit difficult when im not even sure where to start.

T

Solution
‎04-29-2012 09:38 AM
Super User
Posts: 5,092

Re: Datastepping - beyond my skills..

T,

I think I have the right idea here, and that it will be easy enough for you to adjust if not.  This approach saves all the formatting for later on, but finds the initial promotional status, and the initial subsequent subscription status.  It works with your final data set that already has transposed the data, but just as easily could be redone to work with the earlier format to the data with multiple records per ID.

You may need to adjust some of the ranges I used, like 16 to 20 = trial subscription.  But you're most familiar with what the codes mean and which should be grouped.

data want;

   set have;

   array anumabb {16} numabb1 - numabb16;

   do week=1 to 16 until (week_started_trial > .);

        if (16 <= anumabb{week} <= 20) then do;

           week_started_trial = week;

           starting_trial_status = anumabb{week};

       end;

   end;

   * May want to set a default, if there was never a trial;

   if (0 < week_started_trial < 16) then

   do week=week_started_trial+1 to 16 until (week_started_subscription > .);

        if (1 <= anumabb{week} <= 15) then do;

           week_started_subscription = week;

           starting_subscription_status = anumabb{week};

       end;

   end;

   keep id week_started_trial week_started_subscription starting_trial_status starting_subscription_status;

run;

There are lots of ways to format a table from this point.  First, let's see if this is pulling out the information you need (or at least most of it).

Occasional Contributor TM
Occasional Contributor
Posts: 7

Re: Datastepping - beyond my skills..

@ Astounding

That works really well... Thanks alot! I completely understand the code, and from here i can easily reshape and format to the output that i desire.

It pulls out the desired startperiod of a subscribtion, and then tells me how many weeks later the other subscribtion starts. Its perfect - and i think that i can reformat it, so when they go from subscribtion two to three, i can follow.

Again, thanks!

Super Contributor
Posts: 1,636

Re: Datastepping - beyond my skills..

then please mark his code as correct answer. Thank you!

Super User
Posts: 9,687

Re: Datastepping - beyond my skills..

I think you can use ARRAY to solve ' one time period (month) how many opened, and X weeks later, how many of them are still active'.

The question followed it will be easy.

proc import datafile='c:\sampledata.xls' out=have dbms=excel replace;
getnames=yes;run;

data Open ;
set have;
array _a{*} $ change: ;
month=0;
do i=1 to dim(_a)-8;
 month+1;
 if _a{i} ="OPENED to TRIAL_4W" and not missing(_a{i+8}) then output;
end;
run;

 



Ksharp

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 680 views
  • 0 likes
  • 8 in conversation