BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hello, I have a data set with two sorts of “attributes” (probably not using term in strict database sense).

I have a perminant attribute of the unique identifier (for example gender). I have multiple records per identifier which describe a transaction (like an item purchased).

“ID” – unique identifier
“Sex” – gender of ID person
“Product” – what was purchased (beer, diapers, pencils)
Purch_dt – date of purchase

1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
5678 M beer Nov 17, 2008
5678 M pencils Nov 30, 2008

I want to
- count the number of people who purchased both beer and diapers
- count the number of people who purchased beer and no diapers
- determine the number of women who purchased beer and diapers
- for those who purchased both, determine the time lag between the purchase of beer and diapers

I have not written successful code but had two different thoughts (but am willing to do anything that will work).

I tried proc SQL “group by ID” but my instinct in English would to say … by ID where Product = beer and Product = diapers but I do not believe this is correct.

I tried a do loop by ID and flag if Product = beer and if Product = diapers then do, by unique IDs count flags but this was cumbersome and while the code did something I am not sure it was correct.
4 REPLIES 4
Patrick
Opal | Level 21
"for those who purchased both, determine the time lag between the purchase of beer and diapers"


What would be the time lag for the following case? Or would there be several time lags?
1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
1234 M beer Nov 17, 2008
1234 M pencils Nov 30, 2008
1234 F diapers Dec 1, 2008
1234 F beer Dec 24, 2008



For the other 3 requirements:

data have;
infile datalines truncover;
input id sex:$1. product:$8. purch_dt anydtdte12.;
format purch_dt date9.;
datalines;
1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
5678 M beer Nov 17, 2008
5678 M pencils Nov 30, 2008
;

data want;
set have;
by id;
retain flag '00';
select (product);
when ('beer') substr(flag,1,1)='1';
when ('diapers') substr(flag,2,1)='1';
otherwise;
end;
if last.id then
do;
output;
flag='00';
end;
run;

proc format;
value $flag
'00' = 'No Beer / No Diapers'
'10' = 'Beer / No Diapers'
'01' = 'No Beer / Diapers'
'11' = 'Beer / Diapers'
;
run;

proc tabulate data=want;
class sex flag;
format flag $flag.;
keylabel all='Total' n=' ';
table sex='Gender' all,(flag='Products Purchased' all)*f=best12.
;
run;

HTH
Patrick Message was edited by: Patrick
deleted_user
Not applicable
Patrick,

Thank you for your reply. I will get right to work applying the flags. As I am sure you guessed the actual data set is more complex but I tried to make my example easy. The actual data has a quality like "buy beer" but it is a one time event unlike actual grocery purchases where one could buy beer on multiple dates.

Thank you again,
Melissa
Patrick
Opal | Level 21
Hi Melissa

The way you describe your data the following approach might also be interesting for you:


data have;
infile datalines truncover;
input id sex:$1. product:$8. purch_dt anydtdte12.;
format purch_dt date9.;
datalines;
1234 F beer Nov 1, 2008
1234 F diapers Nov 24, 2008
5678 M beer Nov 17, 2008
5678 M pencils Nov 30, 2008
;

proc sql;
create view Vhave as
select * from have
where product in ('beer','diapers')
order by id,sex;
quit;

proc transpose data=Vhave out=want;
by id sex;
id product;
var purch_dt;
run;

data want;
set want;
TimeLagInDays=datdif(beer, diapers, 'act/act');
BeerAndDiapers= not missing(beer+diapers);
run;

options missing=' ';
proc tabulate data=want;
class sex;
var beer diapers BeerAndDiapers TimeLagInDays;
keylabel all='Total' n=' ' sum=' ';
table sex='Gender' all, (beer*n diapers*n BeerAndDiapers*sum TimeLagInDays*mean)*f=best12.
;
run;


HTH
Patrick
Peter_C
Rhodochrosite | Level 12
Melissa
another approach
when you want to bring together different kinds of events
merge the different types like[pre] data combo ;
merge whole_Lot( in=beers where=( criteria for beer event) )
whole_Lot( in=diapr where=( criteria for diapers ) ) ;
by ID ;
join = put( 10* beers + diapr, z2. ) ;
if first.ID ;
run ; [/pre]
This simple approach asumes your data are in ID order
The JOIN variable takes 3 values
11 both diapers and beers
10 beers only
01 diapers only
You might want to take out the "if first.ID; " statement to see all transactions through the merge ( but then use a test like "if _n_ > 1000 then stop; ".
If you need to examine more criteria, like the time between transactions, do as much as possible on the where clauses applied to the data of the merge.
.
Good Luck

PeterC

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 718 views
  • 0 likes
  • 3 in conversation