BookmarkSubscribeRSS Feed
TMorville
Calcite | Level 5
Hi everyone!

I have some data - around 200.000 rows - of subscribtion to a newspaper. Heres an example of the data:

http://dl.dropbox.com/u/1321324/Work/example.xlsx

The variables are selfexplanatory, and dset is date in the form of DDMMYY. The data basicly shows, that someone got a newspaper at a certain date, and at given subscribtion (Abb).

Now. What i want SAS to pick up, is the change for Albert A in abb variable. That he goes from 2 weeks of Campaign, to Normal.

If possible, i would like SAS to count all the people that make a change in Abb, list the change, and count them.

This problem has been bugging me all day, so any help is appriciated.

Thanks a bunch, and have a nice day! Changed subject.


Message was edited by: TMorville
10 REPLIES 10
art297
Opal | Level 21
Toby,

First, I'd suggest deleting the example files you provide. Otherwise, you could run into some serious personal information disclosure problems and your tenure might end up being shorter than desired.

Second, since no one has responded, I'd post your question on SAS-L as well. There are a lot of SAS users there who also have to deal with similar data problems.

Third, my initial suggestion would be to add a date field to each record and keep the app information as is. If you're on 9.2, and the filenames include the date, you can use the indsname option of a set statement to capture the information in your SAS file.

That way you wouldn't need any dummy fields and wouldn't have to make up new fields each week and could continue to simply append new data.

You can always then analyze the long file or, if necessary, use proc transpose to turn it into a wide file later. Similarly, you can always build and apply a format to restructure the data and/or create dummy fields for analytical purposes.

HTH,
Art
TMorville
Calcite | Level 5
Hi Art, and thanks for your reply.

Prior to uploading the data, i rearranged colums and mixed it with another data set, so my data really makes no sense at all. But, thinking about it, publishing peoples full names might not be in the interest of anyone. Thanks for the heads up.

Thanks for the advice about indsname, that might be what i need. Im trying to learn formats, but im having some trouble applying it to this case.

I'll post it on SAS-L too. I've really looked around in there, and it diden't seem like anyone else had that problem.
Cynthia_sas
SAS Super FREQ
Hi:
When you are dealing with data, such as you describe, SAS will not automatically "detect" the change you want to track. However, you can write a program to detect a change in ABB status from one week to the next. Some folks have suggested PROC TRANSPOSE, other people have suggested using ARRAYs. Either method would work.

In either case, you need to consider how to turn data that looks something like this (what I would call long and skinny data):
[pre]
BP NAME ABB DSET
10000001 Jack J. CAMPAIGN 160710
10000001 Jack J. CAMPAIGN 230710
10000001 Jack J. CAMPAIGN 300710
10000001 Jack J. CAMPAIGN 100810
10000001 Jack J. CAMPAIGN 130810
10000006 Karen K NORMAL 230710
10000006 Karen K NORMAL 300710
10000006 Karen K NORMAL 100810
10000006 Karen K NORMAL 130810
10000007 Albert A CAMPAIGN 160710
10000007 Albert A CAMPAIGN 230710
10000007 Albert A NORMAL 300710
10000008 Lisa L CAMPAIGN 230710
10000008 Lisa L CAMPAIGN 300710
10000008 Lisa L CAMPAIGN 100810
10000008 Lisa L CAMPAIGN 130810
10000009 Kal L CAMPAIGN 230710
10000009 Kal L NORMAL 300710
10000009 Kal L NORMAL 100810
10000009 Kal L CAMPAIGN 130810
10000009 Kal L SPECIAL 150910
[/pre]

Into data that is "wide" -- so that you have 1 row per BP/NAME, like this:
[pre]
bp name dt1 dt2 dt3 dt4 dt5 abb1 abb2 abb3 abb4 abb5
10000001 Jack J. 16JUL2010 23JUL2010 30JUL2010 10AUG2010 13AUG2010 CAMPAIGN CAMPAIGN CAMPAIGN CAMPAIGN CAMPAIGN
10000006 Karen K 23JUL2010 30JUL2010 10AUG2010 13AUG2010 . NORMAL NORMAL NORMAL NORMAL
10000007 Albert A 16JUL2010 23JUL2010 30JUL2010 . . CAMPAIGN CAMPAIGN NORMAL
10000008 Lisa L 23JUL2010 30JUL2010 10AUG2010 13AUG2010 . CAMPAIGN CAMPAIGN CAMPAIGN CAMPAIGN
10000009 Kal L 23JUL2010 30JUL2010 10AUG2010 13AUG2010 15SEP2010 CAMPAIGN NORMAL NORMAL CAMPAIGN SPECIAL
[/pre]

You only have 52 weeks in any single year period, so that's not a huge array. For any single BP or NAME, you could find out how many of the ABB values were the same and how many were different just by comparing each ABB to the next ABB in the array.

Based on using an ARRAY, you could come up with a summary like this for each BP/NAME:
[pre]
bp name numsame numdiff numweeks
10000001 Jack J. 4 0 5
10000006 Karen K 3 0 4
10000007 Albert A 1 1 3
10000008 Lisa L 3 0 4
10000009 Kal L 1 3 5
[/pre]

Where NUMSAME is the number of comparisons that were the same and NUMDIFF is the number of comparisons that were different. For example, Albert A, if you eyeball the data -- has 3 total weeks of data -- between the first value and the second value, there is no change, so that's where the 1 came from for NUMSAME; but, between the second value and the third value, the values were different, so that's how NUMDIFF would get set to 1. If you needed to capture the FROM and TO values where the change occurred, you could do that too. Something like this (shown for Albert):
[pre]
bp name abbdiff datediff
10000007 Albert A from: CAMPAIGN to: NORMAL 30JUL2010
[/pre]

Using ARRAYs to turn skinny data into WIDE data will require that you understand how ARRAYs work, how the DATA step works and how BY group processing works (because you will need to build your ARRAY using FIRST.BY-variable and LAST.BY-variable processing.

A good paper that introduces ARRAY processing is this one:
http://support.sas.com/rnd/papers/sgf07/arrays1780.pdf

A good paper on basic DATA step concepts is here:
http://www2.sas.com/proceedings/sugi29/252-29.pdf

cynthia
TMorville
Calcite | Level 5
Hi Cynthia!

Wow. Thanks a bunch. Proc array is a completely new command for me. I think this will work out perfectly. I'll write back if i run into any problems.

Thanks again!
TMorville
Calcite | Level 5
Working piece of code to anyone having my problems:


data rotate;
set retain_test;
by bp_2;
retain numabb28-numabb32;
array Anumabb(28:32) numabb28-numabb32;
if first.bp_2 then do;
do i = 28 to 32;
Anumabb = .;
end;
end;
Anumabb(week) = numabb;
if last.bp_2 then output;
drop week bp_2 i;
run;

data rotate;
set rotate;
array Anumabb(5) numabb28-numabb32;
do i = 1 to 5;
if Anumabb = ' ' then Anumabb = 0;
end;
drop i;
run;

data pattern;
set rotate;
length skift $ 15;
array Anumabb(5) numabb28-numabb32;
array AChange(2:5) change2-change5;
do w = 2 to 5;
if Anumabb ne Anumabb [w-1] then AChange = 1;
else AChange = 0;
if Anumabb + Anumabb[w-1] = 3 then AChange = 2;
end;
sum_change = sum(of change:);
if sum_change = 1 then skift = 'Ja';
if sum_change = 0 then skift = 'No Change';
if sum_change = 2 then skift = 'Kamp til Norm';
drop w;
run;

HOWEVER... I've been having a problem with this part of the code:

if Anumabb + Anumabb[w-1] = 3 then AChange = 2;
end;


If i get a string of code that gives me 1 0 0 1 1, it will somehow add it to being 3? Message was edited by: TMorville
Peter_C
Rhodochrosite | Level 12
TMorville

couldn't get your code working.

Tried this based on the sample data Cynthia offered.[pre]data one_week ;
input BP $ NAME $ nameI $ ABB $ DSET ;
attrib dset format= ddmmyy. informat= ddmmyy. ;
cards ;
10000001 Jack J. CAMPAIGN 160710
10000001 Jack J. CAMPAIGN 230710
10000001 Jack J. CAMPAIGN 300710
10000001 Jack J. CAMPAIGN 100810
10000001 Jack J. CAMPAIGN 130810
10000006 Karen K NORMAL 230710
10000006 Karen K NORMAL 300710
10000006 Karen K NORMAL 100810
10000006 Karen K NORMAL 130810
10000007 Albert A CAMPAIGN 160710
10000007 Albert A CAMPAIGN 230710
10000007 Albert A NORMAL 300710
10000008 Lisa L CAMPAIGN 230710
10000008 Lisa L CAMPAIGN 300710
10000008 Lisa L CAMPAIGN 100810
10000008 Lisa L CAMPAIGN 130810
10000009 Kal L CAMPAIGN 230710
10000009 Kal L NORMAL 300710
10000009 Kal L NORMAL 100810
10000009 Kal L CAMPAIGN 130810
10000009 Kal L SPECIAL 150910
;[/pre]Then I look for change in ABB within bp.
The easiest way I know is using sorted data (on BP and date) and use the NOTSORTED option in a BY statement which uses BP and ABB, as follows[pre]proc sort ;
by bp dset ;
run ;
data changes ;
changes= 0;
do until( last.bp) ;
set ;
by bp abb notsorted ;
changes +(first.abb and not first.bp) ;
end ;
if changes ; * ignore those with no changes ;
keep bp name: changes abb dset;
rename abb = latest_abb dset=latest_dset ;
run;[/pre] With the sample data above, that collects just these two lines[pre] name latest_ latest_
Obs changes BP NAME I abb dset

1 1 10000007 Albert A NORMAL 30/07/10
2 3 10000009 Kal L SPECIAL 15/09/10[/pre]
I think that approach might be extensible, using a view over multiple weeks, like
data view/view=view ;
set one_week two_week three_week ;
by bp dset ;
run ;
and feed this view into the step which calculates the changes.

I don't think arrays are needed
or have I missed the point?

Are the individual changes needed?
Then output observations when you detect a change in ABB[pre] if first.abb and not first.bp then output ;[/pre] in the middle of relevant loop
TMorville
Calcite | Level 5
Hi Peter. Thanks a bunch. Definitely useable code. The reason my code does not work, is because cynthia and i have different variable names. I've changed the ones i use during my code, so its more of a pointer than direct anwser to the data i put forward.

I havin' some problems with arrays, in the conclusion phase. I would really like to make a simple "If X and Y then Z" but it seems this is something SAS is unable to do?

What i would like to do is, "If CAMPAIGN (week1) and NORMAL (week2) then CHANGEVAR = 1" or something similair. But i guess 'and' isen't a recognized SAS statement in that form.

Well, im off to home now - hope to see a anwser tomorrow!

PS: This is the stuff im trying to get to work:

data pattern;
set rotate;
length skift $ 15;
array Anumabb(5) numabb28-numabb32;
array AChange(2:5) change2-change5;
do w = 2 to 5;

if Anumabb ne Anumabb [w-1] then AChange = 1;
else AChange = 0;

if Anumabb[w-1] = 2 and Anumabb = 1 then AChange = 2;

end;
sum_change = sum(of change:);
if sum_change = 1 then skift = 'Ja';
if sum_change = 0 then skift = 'No Change';
if sum_change = 2 then skift = 'Kamp til Norm';
drop w;
run;

Thanks for the huge amount of help, both of you!
Cynthia_sas
SAS Super FREQ
Hi:
Of course SAS can do a simple IF statement. Your example doesn't quite follow SAS syntax and, actually, I'm not sure in WHAT language you can use that particular type of syntax. "If X and Y then Z" -- seems like BOOLEAN construction where the VARIABLE named X must be "TRUE" or 1 and the VARIABLE named Y must be "TRUE" or 1. The SAS way to code that IF statement is:
[pre]
if x=1 and y=1 then z=1; OR if x and y then z=1;
[/pre]

Of course that syntax assumes that the variable NAMES are X and Y and Z. If you mean something like this:
[pre]
if var1='X' and var2 = 'Y' then var3 = 'Z';
[/pre]

Where X, Y and Z are the variable VALUES, not the variable NAMES.

However, you have to decide what your data is and what your variables are. This construction:
"If CAMPAIGN (week1) and NORMAL (week2) then CHANGEVAR = 1"
will NOT work unless you have an ARRAY NAMED CAMPAIGN and an ARRAY NAMED NORMAL and they are indexed by the WEEK1 or WEEK2 variable.

Once you understand array processing, you could set up an array like this:
[pre]
array ab $ ab1-ab52;
[/pre]

and then the if statement would be something like this, inside a DO loop:
[pre]
do i = 1 to 51;
if ab(i) = 'CAMPAIGN' and ab(i+1) = 'NORMAL' then changevar = 1;
end;
[/pre]

The first time through the loop, you will be comparing week1 to week2 and the second time through the loop you will be compaing week2 to week3 and the third time through the loop you will be comparing week3 to week4, etc, etc. This technique assumes that you have already placed the ABB variable into a 52 member array.

But it seems to me that there's a possible logic flaw in the above logic, unless you KNOW for sure that the change you want to track is ALWAYS from CAMPAIGN to NORMAL -- what if someone starts out as NORMAL in Week1 and then changes to CAMPAIGN in Week2??? Or what happens if there's some "SPECIAL" campaign (as shown in my fake data in Kal L's rows. It seems to me it might be better to just see whether there was ANY change like this (inside a DO loop):

[pre]
do i = 1 to numweeks - 1;
if ab(i) eq ab(i+1) then numsame+1;
else if ab(i) ne ab(i+1) then do;
numdiff + 1;
datediff = dt(i+1);
abbdiff = catx(' ','from: ',ab(i),'to:',ab(i+1);
output changes;
end;
end;
output all;
[/pre]

In the above example, it shows an alternate technique -- instead of using an ARRAY that goes from 1-52, you just put everybody's ABB value and DATE value into arrays AB and DT respectively and then capture how many weeks (NUMWEEKS) they have for their particular data. So one person could have 5 weeks and another person could have 52 weeks and it wouldn't make any difference. The max ARRAY size could be 52, but you would only have a DO loop iterate for the value of NUMWEEKS for every person (or BP).

By comparing the current array member indexed by I to the next array member, indexed by I+1, you would be able to detect ANY change -- not just changes from CAMPAIGN to NORMAL, but also changes from NORMAL to CAMPAIGN or from anything to anything. The above code snippet would create 2 files -- one for ONLY the changes and another for everybody, whether there were changes or not.

Did you read the paper about arrays that I posted the link to???? It explains, quite well, how to use and index arrays. And, it also explains that the ARRAY statement (not PROC ARRAY) is part of a DATA step program.

On the other hand, Peter's solution is quite elegant - but since you said you wanted to know the NAME of the person who changed, then it seems to me that you might want to alter Peter's program (if you were going to change to that technique) to pay attention to FIRST.BP or FIRST.NAME. Using Arrays is only one possible solution. But you DO have to understand how DATA steps and VIEWS operate.

Also, do remember that there is NO need to post your question in more than one forum. Now that I know you've posted the same question in two places, I wonder whether you've added some relevant nugget of information over there.

cynthia
TMorville
Calcite | Level 5
Hi Cynthia.

Yes. Sorry about the double posting, i diden't count on anyone following this thread. Theres no real extra info in the other thread, and i wanted to delete it, but it seems i can only edit it. Well thats the last time ill do a double post on this forum.

Regards to the Arrays Tutorial, yes i did read the paper and i also found some rather good instuctional videos here:

http://www.ats.ucla.edu/stat/sas/seminars/sas_arrays/default.htm

So i've done some of my homework. But arrays is completely new for me, so i need to get in the mindset for SAS datawork, when im used to do statistics only. So in other words, the array code that i've written so far works, but i still need to comprehend completely what it can do, and how it works.

Regarding what i wanted to do with the code.

if Anumabb[w-1] = 2 and Anumabb = 1 then AChange = 2;


You can say that i've done it the opposite way that you have. Right now i only have 5 weeks, and seeing there is no week 0, then i can only detect 4 weeks, but again - SAS wont accept my "and" in the middle there.
And SAS won't eat it.

But... the code you just wrote looks like it will work like a charm. Regarding the logical flaw. It's not a problem. The subscribtion goes from CAMPAIGN, SPECIAL, XXX to NORMAL or EMPTY. Kals example will not happen in the real data.

In my code i've only used array as a datastep, and not proc array.

Well, again, thanks a bunch for the help!

Message was edited by: TMorville EDIT: OK. I will completly eat my own words. As i booted my PC and ran the code again. The troublesome "AND" statement worked.

Message was edited by: TMorville

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1187 views
  • 0 likes
  • 4 in conversation