DATA Step, Macro, Functions and more

Going from vast dummy fields too a repeat/long term solution

Posts: 43

Going from vast dummy fields too a repeat/long term solution

Hello everyone!

I’ve just been employed at a newspaper to do some statistical analyses on their sales. I have sas experience from my studies, but mostly the statistical part. Alas, my sas “data” skills are not entirely up to the job in such a way that it will be a long lasting solution.

Here’s my problem:

I get a bunch of data on flat files once a week, and I’m interested in movements between a number of variables in-between these weekly data dumps.
They are separated on client basis, with a so-called “BP” number (ex. 2000008939). There are around 50000 of those unique numbers in my data. Every BP number has a number of variables attached. Such as shipping address, campaign code ect.

So now, imagine, that from one week to the next, some clients will leave the newspaper, and new ones will join the paper. That means that I have a change in the BP numbers from week to week. Those are the ones I want to match by. So far, I’ve managed to do it with simple dummy variables – say wea1 = 1 for active subscription in week one, wea2 = 2 for active subscription in week2 – and so forth.
For now I have only 5 weeks of data, and 3 variables of interest, and thus it’s manageable with a dummy for each week, and a dummy for each variable.

Imagine that in a few months have several weeks of data, each with around 7-8 variables of interest. That will lead to a dummy mess I can’t begin to imagine.

My problem is, most often, that I get lots of code that looks like this:

data wea100716;
set wea100716;
if abb = 'CAMPAIGN' then camp1 = 1;
if abb = 'NORMAL' then normal1 = 1;
if abb = 'FREE' then free1 = 1;
if abb = 'TRIAL' then trial1 = 1;
if abb = 'UNKNOWN' then unknown1 = 1;
if abb = 'BUNDLING' then bund1 = 1;
if abb = 'MEMBER G' then mem1 = 1;

data wea100723;
set wea100723;
if abb = 'CAMPAIGN' then camp2 = 1;
if abb = 'NORMAL' then normal2 = 1;
if abb = 'FREE' then free2 = 1;
if abb = 'TRIAL' then trial2 = 1;
if abb = 'UNKNOWN' then unknown2 = 1;
if abb = 'BUNDLING' then bund2 = 1;
if abb = 'MEMBER G' then mem3 = 1;

data wea100730;
set wea100730;
if abb = 'CAMPAIGN' then camp3 = 1;
if abb = 'NORMAL' then normal3 = 1;
if abb = 'FREE' then free3 = 1;
if abb = 'TRIAL' then trial3 = 1;
if abb = 'UNKNOWN' then unknown3 = 1;
if abb = 'BUNDLING' then bund3 = 1;
if abb = 'MEMBER G' then mem3 = 1;

data wea100806;
set wea100806;
if abb = 'CAMPAIGN' then camp4 = 1;
if abb = 'NORMAL' then normal4 = 1;
if abb = 'FREE' then free4 = 1;
if abb = 'TRIAL' then trial4 = 1;
if abb = 'UNKNOWN' then unknown4 = 1;
if abb = 'BUNDLING' then bund4 = 1;
if abb = 'MEMBER G' then mem4 = 1;

data wea100813;
set wea100813;
if abb = 'CAMPAIGN' then camp5 = 1;
if abb = 'NORMAL' then normal5 = 1;
if abb = 'FREE' then free5 = 1;
if abb = 'TRIAL' then trial5 = 1;
if abb = 'UNKNOWN' then unknown5 = 1;
if abb = 'BUNDLING' then bund5 = 1;
if abb = 'MEMBER G' then mem5 = 1;

Which is (I think) a very stupid way, of looking at the development in the variable “abb” which shows, what kind of subscription the clients have. If let’s say, free5 = 1, then I know that BP number x, had that kind of subscription in week 5.

One of my goals, is to calculate how many unique BP numbers go from ‘CAMPAIGN’ to ‘NORMAL’ in between the weeks. So I need to match my “dummy for client being there in week 1” with “client being there in week two” with “client having abb = CAMPAIGN in week one” to “client having abb = normal in week two”.

God. Just the sheer tough of it makes me want to quit and be a street musician.


What I would like to do, is to make some sort of “macro” or “loop” that can recognize different variables and then do a string of code, perhaps like the one above, from n=1 to n=10 for different numbers. Here is what I imagine it would look like. Where n=1 for week one, and i being some number I prespecify…. Catch my drift?

data wea100813;
set wea100813;
if abb = 'CAMPAIGN' then camp(n+i) = 1;
if abb = 'NORMAL' then normal(n+i) = 1;
if abb = 'FREE' then free(n+i) = 1;
if abb = 'TRIAL' then trial(n+i) = 1;
if abb = 'UNKNOWN' then unknown(n+i) = 1;
if abb = 'BUNDLING' then bund(n+i) = 1;
if abb = 'MEMBER G' then mem(n+i) = 1;

I just do not, want to end up with 36 week dummys, looking at 10 different variables for EACH week.

Well. I hope you get it.

Here is a link to my sascode.


I really hope you guys can gimme some pointers to some smart sas functions that would relieve me of my dummy to answers pain.

Thanks in advance!
Posts: 43

Re: Going from vast dummy fields too a repeat/long term solution

Ok. I've come a step closer. Now i have this string of code:

* your Excel file;
libname x 'MYLIBNAME';
* put the names of the sheets into a macro variable;
proc sql noprint;
select catt('x.',memname,'n') into :dsets separated by ' '
from dictionary.tables
where libname eq 'X';
* use the indsname option to grab the sheet name;
data all;
set &dsets indsname=dsn;
dset = scan(dsn,2,"'$");
libname x clear;

data sort;
set sort;
if bp_2 = ' ' then delete;

proc sort data = all out = sort;
by bp_2;

This means what i get some output that look like this:

What i would like to do, is to have a piece of code that can pick out a guy like our friend Albert, who changes from one subscribtion form to another.

I would also like to classify someone like Jack, because hes been there all five weeks.
Posts: 1,564

Re: Going from vast dummy fields too a repeat/long term solution

Unsure of you final goal, but I have modified some of your sas code to make it more efficient and show you what sort of things sas can do.

Time for some proper old-fashioned training it seems to me.


proc sort data = wea100716;
by bp_2;
where BP_2 ne '';

proc sort data = wea100723;
by bp_2;
where BP_2 ne '';

proc sort data = wea100730;
by bp_2;
where BP_2 ne '';

proc sort data = wea100806;
by bp_2;
where BP_2 ne '';

proc sort data = wea100813;
by bp_2;
where BP_2 ne '';

data udv;
array abb{5} $8;
merge wea100716(rename=(abb=abb1) in=_1)
wea100723(rename=(abb=abb2) in=_2)
wea100730(rename=(abb=abb3) in=_3)
wea100806(rename=(abb=abb4) in=_4)
wea100813(rename=(abb=abb5) in=_5);
by bp_2;
if _1 then wea1=1;
if _2 then wea2=2;
if _3 then wea3=3;
if _4 then wea4=4;
if _5 then wea5=5;
weasum = sum(wea1,wea2,wea3,wea4,wea5);
do i=1 to 5;
if abb(i) = 'CAMPAIGN' then camp + 1;
if abb(i) = 'NORMAL' then normal + 1;
Posts: 8,743

Re: Going from vast dummy fields too a repeat/long term solution

There's no need to post the same question in two different forums.

Posts: 7,363

Re: Going from vast dummy fields too a repeat/long term solution


I offered my criticism of you comment regarding cross posting the other day, but will now offer an alternative.

Why not allow people to indicate which forums in which their questions should be displayed and, if multiple forums are indicated, automatically crosspost not only the questions, but all responses?

JTOOTB (just thinking out of the box),
Posts: 8,743

Re: Going from vast dummy fields too a repeat/long term solution

I am not the forum moderator, so I can't allow anything to be "automatically" posted in two forums. But since the forum moderator does review most of the postings, I'm sure that your suggestion will be added to the suggestion box for review when the time comes to make or request enhancements to the Jive Forums software (the software that "runs" these forums).

Ask a Question
Discussion stats
  • 5 replies
  • 4 in conversation