Help summarizing attributes

deleted_user · Posted 01-05-2010 02:59 PM

Hello, I have a data set with two sorts of “attributes” (probably not using term in strict database sense).

I have a perminant attribute of the unique identifier (for example gender). I have multiple records per identifier which describe a transaction (like an item purchased).

“ID” – unique identifier
“Sex” – gender of ID person
“Product” – what was purchased (beer, diapers, pencils)
Purch_dt – date of purchase

1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
5678 M beer Nov 17, 2008
5678 M pencils Nov 30, 2008

I want to
- count the number of people who purchased both beer and diapers
- count the number of people who purchased beer and no diapers
- determine the number of women who purchased beer and diapers
- for those who purchased both, determine the time lag between the purchase of beer and diapers

I have not written successful code but had two different thoughts (but am willing to do anything that will work).

I tried proc SQL “group by ID” but my instinct in English would to say … by ID where Product = beer and Product = diapers but I do not believe this is correct.

I tried a do loop by ID and flag if Product = beer and if Product = diapers then do, by unique IDs count flags but this was cumbersome and while the code did something I am not sure it was correct.

Patrick · Posted 01-05-2010 08:28 PM

"for those who purchased both, determine the time lag between the purchase of beer and diapers"

What would be the time lag for the following case? Or would there be several time lags?
1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
1234 M beer Nov 17, 2008
1234 M pencils Nov 30, 2008
1234 F diapers Dec 1, 2008
1234 F beer Dec 24, 2008

For the other 3 requirements:

data have;
infile datalines truncover;
input id sex:$1. product:$8. purch_dt anydtdte12.;
format purch_dt date9.;
datalines;
1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
5678 M beer Nov 17, 2008
5678 M pencils Nov 30, 2008
;

data want;
set have;
by id;
retain flag '00';
select (product);
when ('beer') substr(flag,1,1)='1';
when ('diapers') substr(flag,2,1)='1';
otherwise;
end;
if last.id then
do;
output;
flag='00';
end;
run;

proc format;
value $flag
'00' = 'No Beer / No Diapers'
'10' = 'Beer / No Diapers'
'01' = 'No Beer / Diapers'
'11' = 'Beer / Diapers'
;
run;

proc tabulate data=want;
class sex flag;
format flag $flag.;
keylabel all='Total' n=' ';
table sex='Gender' all,(flag='Products Purchased' all)*f=best12.
;
run;

HTH
Patrick Message was edited by: Patrick

deleted_user · Posted 01-06-2010 07:21 AM

Patrick,

Thank you for your reply. I will get right to work applying the flags. As I am sure you guessed the actual data set is more complex but I tried to make my example easy. The actual data has a quality like "buy beer" but it is a one time event unlike actual grocery purchases where one could buy beer on multiple dates.

Thank you again,
Melissa

Patrick · Posted 01-07-2010 04:01 AM

Hi Melissa

The way you describe your data the following approach might also be interesting for you:

data have;
infile datalines truncover;
input id sex:$1. product:$8. purch_dt anydtdte12.;
format purch_dt date9.;
datalines;
1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
5678 M beer Nov 17, 2008
5678 M pencils Nov 30, 2008
;

proc sql;
create view Vhave as
select * from have
where product in ('beer','diapers')
order by id,sex;
quit;

proc transpose data=Vhave out=want;
by id sex;
id product;
var purch_dt;
run;

data want;
set want;
TimeLagInDays=datdif(beer, diapers, 'act/act');
BeerAndDiapers= not missing(beer+diapers);
run;

options missing=' ';
proc tabulate data=want;
class sex;
var beer diapers BeerAndDiapers TimeLagInDays;
keylabel all='Total' n=' ' sum=' ';
table sex='Gender' all, (beer*n diapers*n BeerAndDiapers*sum TimeLagInDays*mean)*f=best12.
;
run;

HTH
Patrick

Peter_C · Posted 01-07-2010 02:35 PM

Melissa
another approach
when you want to bring together different kinds of events
merge the different types like[pre] data combo ;
merge whole_Lot( in=beers where=( criteria for beer event) )
whole_Lot( in=diapr where=( criteria for diapers ) ) ;
by ID ;
join = put( 10* beers + diapr, z2. ) ;
if first.ID ;
run ; [/pre]
This simple approach asumes your data are in ID order
The JOIN variable takes 3 values
11 both diapers and beers
10 beers only
01 diapers only
You might want to take out the "if first.ID; " statement to see all transactions through the merge ( but then use a test like "if _n_ > 1000 then stop; ".
If you need to examine more criteria, like the time between transactions, do as much as possible on the where clauses applied to the data of the merge.
.
Good Luck

PeterC

Help summarizing attributes

Re: Help summarizing attributes

Re: Help summarizing attributes

Re: Help summarizing attributes

Re: Help summarizing attributes

Help summarizing attributes

Re: Help summarizing attributes

Re: Help summarizing attributes

Re: Help summarizing attributes

Re: Help summarizing attributes

Click image to register for webinar

Classroom Training Available!