DATA Step, Macro, Functions and more

Help summarizing attributes

Reply
N/A
Posts: 0

Help summarizing attributes

Hello, I have a data set with two sorts of “attributes” (probably not using term in strict database sense).

I have a perminant attribute of the unique identifier (for example gender). I have multiple records per identifier which describe a transaction (like an item purchased).

“ID” – unique identifier
“Sex” – gender of ID person
“Product” – what was purchased (beer, diapers, pencils)
Purch_dt – date of purchase

1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
5678 M beer Nov 17, 2008
5678 M pencils Nov 30, 2008

I want to
- count the number of people who purchased both beer and diapers
- count the number of people who purchased beer and no diapers
- determine the number of women who purchased beer and diapers
- for those who purchased both, determine the time lag between the purchase of beer and diapers

I have not written successful code but had two different thoughts (but am willing to do anything that will work).

I tried proc SQL “group by ID” but my instinct in English would to say … by ID where Product = beer and Product = diapers but I do not believe this is correct.

I tried a do loop by ID and flag if Product = beer and if Product = diapers then do, by unique IDs count flags but this was cumbersome and while the code did something I am not sure it was correct.
Respected Advisor
Posts: 4,173

Re: Help summarizing attributes

Posted in reply to deleted_user
"for those who purchased both, determine the time lag between the purchase of beer and diapers"


What would be the time lag for the following case? Or would there be several time lags?
1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
1234 M beer Nov 17, 2008
1234 M pencils Nov 30, 2008
1234 F diapers Dec 1, 2008
1234 F beer Dec 24, 2008



For the other 3 requirements:

data have;
infile datalines truncover;
input id sex:$1. product:$8. purch_dt anydtdte12.;
format purch_dt date9.;
datalines;
1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
5678 M beer Nov 17, 2008
5678 M pencils Nov 30, 2008
;

data want;
set have;
by id;
retain flag '00';
select (product);
when ('beer') substr(flag,1,1)='1';
when ('diapers') substr(flag,2,1)='1';
otherwise;
end;
if last.id then
do;
output;
flag='00';
end;
run;

proc format;
value $flag
'00' = 'No Beer / No Diapers'
'10' = 'Beer / No Diapers'
'01' = 'No Beer / Diapers'
'11' = 'Beer / Diapers'
;
run;

proc tabulate data=want;
class sex flag;
format flag $flag.;
keylabel all='Total' n=' ';
table sex='Gender' all,(flag='Products Purchased' all)*f=best12.
;
run;

HTH
Patrick Message was edited by: Patrick
N/A
Posts: 0

Re: Help summarizing attributes

Patrick,

Thank you for your reply. I will get right to work applying the flags. As I am sure you guessed the actual data set is more complex but I tried to make my example easy. The actual data has a quality like "buy beer" but it is a one time event unlike actual grocery purchases where one could buy beer on multiple dates.

Thank you again,
Melissa
Respected Advisor
Posts: 4,173

Re: Help summarizing attributes

Posted in reply to deleted_user
Hi Melissa

The way you describe your data the following approach might also be interesting for you:


data have;
infile datalines truncover;
input id sex:$1. product:$8. purch_dt anydtdte12.;
format purch_dt date9.;
datalines;
1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
5678 M beer Nov 17, 2008
5678 M pencils Nov 30, 2008
;

proc sql;
create view Vhave as
select * from have
where product in ('beer','diapers')
order by id,sex;
quit;

proc transpose data=Vhave out=want;
by id sex;
id product;
var purch_dt;
run;

data want;
set want;
TimeLagInDays=datdif(beer, diapers, 'act/act');
BeerAndDiapers= not missing(beer+diapers);
run;

options missing=' ';
proc tabulate data=want;
class sex;
var beer diapers BeerAndDiapers TimeLagInDays;
keylabel all='Total' n=' ' sum=' ';
table sex='Gender' all, (beer*n diapers*n BeerAndDiapers*sum TimeLagInDays*mean)*f=best12.
;
run;


HTH
Patrick
Valued Guide
Posts: 2,177

Re: Help summarizing attributes

Posted in reply to deleted_user
Melissa
another approach
when you want to bring together different kinds of events
merge the different types like[pre] data combo ;
merge whole_Lot( in=beers where=( criteria for beer event) )
whole_Lot( in=diapr where=( criteria for diapers ) ) ;
by ID ;
join = put( 10* beers + diapr, z2. ) ;
if first.ID ;
run ; [/pre]
This simple approach asumes your data are in ID order
The JOIN variable takes 3 values
11 both diapers and beers
10 beers only
01 diapers only
You might want to take out the "if first.ID; " statement to see all transactions through the merge ( but then use a test like "if _n_ > 1000 then stop; ".
If you need to examine more criteria, like the time between transactions, do as much as possible on the where clauses applied to the data of the merge.
.
Good Luck

PeterC
Ask a Question
Discussion stats
  • 4 replies
  • 128 views
  • 0 likes
  • 3 in conversation