Hello,
I am a student learning SAS programming. I've been tasked with creating a segment
Here is the last results that I am using:
Now what I need to do is:
Create 3 new "Customer Purchase Segments":
i) The fans that bought all four merchandise items. Name this "All Merch".
ii) The fans that did not buy any merchandise items. Name this "No Merch".
iii) The fans that bought at least one item, but not more than 3 items. "Some Merch".
I suspect that I am supposed to use if / then statements, but I'm not sure. Since I have 4 columns of data to consider (highlighted in the image above), I'm not entirely sure how to write this.
Please help.
Thank you so much, in advance!
And another idea:
data profoot.pro_football_segments; set profoot.pro_football_groups; total_merch= sum(bought_other_merch='Yes',bought_sweatshirt='Yes',bought_team_hat='Yes',bought_team_tshirt='Yes'); /* typically at this point I would create a custom format for the total and be done*/ length segment $ 20; select (total_merch); when (0) segment='No merch'; when (4) segment='All merch'; otherwise segment='Some merch'; end; run;
The Total_merch numeric may be useful other purposes.
A format like this could be used to format total_merch for analysis or report purposes.
proc format library=work; value total_merch 0,. = 'No merch' 4 = 'All merch' 1,2,3 = 'Some merch' ; run;
Example:
if bought_hat='Yes' and bought_tshirt='Yes' and bought_other='Yes' and bought_sweatshirt='Yes' then segment='All Merch';
Thank you so much for the quick response. Do I need to create the variable first? If so, could you guide me on how that is done? I can't seem to get it work; keep getting errors in my log.
Thank you, again!
Post your log, then we can see what is wrong 🙂
You can run the following code on your data to find out what the variable names actually are, as I suspect the ones you see are labels
proc contents data = yourdata; /* <-- Insert your dataset name here */
run;
@bldudley wrote:
Thank you so much for the quick response. Do I need to create the variable first? If so, could you guide me on how that is done? I can't seem to get it work; keep getting errors in my log.
Thank you, again!
You can't just say "keep getting errors in my log".
Show us the SASLOG so we can see what you are seeing.
You have a sample above but I thought I'd point you to some references so you can read up on this if you want:
SAS docs that cover IF/THEN statements
SAS Video tutorials list:
http://support.sas.com/training/tutorial/
Note on the bottom right hand corner you'll find a link to the free SAS programming e-course.
Hello,
I am adding my code and log information
data profoot.pro_football_segments;
set profoot.pro_football_groups;
label segment = 'Customer Purchase Segments';
if bought_other_merch='Yes' and bought_sweatshirt='Yes' and bought_team_hat='Yes' and bought_team_tshirt='Yes' then segment='All Merch';
if bought_other_merch='No' and bought_sweatshirt='No' and bought_team_hat='No' and bought_team_tshirt='No' then segment='No Merch';
Else if 'Some Merch';
run;
LOG:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
61
62 data profoot.pro_football_segments;
63 set profoot.pro_football_groups;
64 label segment = 'Customer Purchase Segments';
65 if bought_other_merch='Yes' and bought_sweatshirt='Yes' and bought_team_hat='Yes' and bought_team_tshirt='Yes' then
65 ! segment='All Merch';
66 if bought_other_merch='No' and bought_sweatshirt='No' and bought_team_hat='No' and bought_team_tshirt='No' then
66 ! segment='No Merch';
67 Else if 'Some Merch';
68 run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
67:10
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=10153 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=No Bought_Sweatshirt=Yes payment_method=Team Website
dollars_spent=155 H=. i=5 gender=Male age=46 Marital_Status=Married Game_Attendance=Season Ticket Holder games_attended=8
Age_bracket=40s segment= _ERROR_=1 _N_=1
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=10246 Bought_Team_Hat=Yes Bought_Team_TShirt=No Bought_Other_Merch=Yes Bought_Sweatshirt=Yes payment_method=Team Website
dollars_spent=190 H=. i=5 gender=Female age=19 Marital_Status=Single Game_Attendance=Attends Less Than 50% of Games games_attended=1
Age_bracket=Under 20 segment= _ERROR_=1 _N_=2
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=10591 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=Yes Bought_Sweatshirt=No payment_method=Stadium Credit
dollars_spent=135 H=. i=5 gender=Female age=24 Marital_Status=Married Game_Attendance=Attends More Than 50% of Games
games_attended=5 Age_bracket=20s segment= _ERROR_=1 _N_=3
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=10791 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=Yes Bought_Sweatshirt=No payment_method=Team Website
dollars_spent=135 H=. i=5 gender=Male age=42 Marital_Status=Married Game_Attendance=Attends Less Than 50% of Games games_attended=2
Age_bracket=40s segment= _ERROR_=1 _N_=4
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=11319 Bought_Team_Hat=Yes Bought_Team_TShirt=No Bought_Other_Merch=Yes Bought_Sweatshirt=No payment_method=Team Website
dollars_spent=105 H=. i=5 gender=Male age=32 Marital_Status=Single Game_Attendance=Attends Less Than 50% of Games games_attended=3
Age_bracket=30s segment= _ERROR_=1 _N_=5
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=11925 Bought_Team_Hat=Yes Bought_Team_TShirt=No Bought_Other_Merch=Yes Bought_Sweatshirt=No payment_method=Stadium Credit
dollars_spent=105 H=. i=5 gender=Female age=35 Marital_Status=Single Game_Attendance=Attends Less Than 50% of Games games_attended=1
Age_bracket=30s segment= _ERROR_=1 _N_=6
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=12369 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=Yes Bought_Sweatshirt=Yes payment_method=Team Website
dollars_spent=220 H=. i=5 gender=Male age=45 Marital_Status=Married Game_Attendance=Attends More Than 50% of Games games_attended=6
Age_bracket=40s segment=All Merch _ERROR_=1 _N_=7
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=12524 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=Yes Bought_Sweatshirt=Yes payment_method=Stadium Cash
dollars_spent=220 H=. i=5 gender=Female age=27 Marital_Status=Married Game_Attendance=Attends More Than 50% of Games
games_attended=5 Age_bracket=20s segment=All Merch _ERROR_=1 _N_=8
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=12532 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=No Bought_Sweatshirt=Yes payment_method=Stadium Credit
dollars_spent=155 H=. i=5 gender=Female age=48 Marital_Status=Single Game_Attendance=Attends Less Than 50% of Games games_attended=4
Age_bracket=40s segment= _ERROR_=1 _N_=9
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=12539 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=Yes Bought_Sweatshirt=No payment_method=Stadium Credit
dollars_spent=135 H=. i=5 gender=Female age=50 Marital_Status=Single Game_Attendance=Attends More Than 50% of Games games_attended=7
Age_bracket=50s segment= _ERROR_=1 _N_=10
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=12784 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=Yes Bought_Sweatshirt=Yes payment_method=Team Website
dollars_spent=220 H=. i=5 gender=Female age=44 Marital_Status=Married Game_Attendance=Attends More Than 50% of Games
games_attended=7 Age_bracket=40s segment=All Merch _ERROR_=1 _N_=11
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=12978 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=Yes Bought_Sweatshirt=No payment_method=Team Website
dollars_spent=135 H=. i=5 gender=Female age=25 Marital_Status=Married Game_Attendance=Attends More Than 50% of Games
games_attended=7 Age_bracket=20s segment= _ERROR_=1 _N_=12
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=13107 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=Yes Bought_Sweatshirt=Yes payment_method=Stadium Credit
dollars_spent=220 H=. i=5 gender=Male age=58 Marital_Status=Single Game_Attendance=Attends More Than 50% of Games games_attended=7
Age_bracket=50s segment=All Merch _ERROR_=1 _N_=13
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=13172 Bought_Team_Hat=Yes Bought_Team_TShirt=No Bought_Other_Merch=No Bought_Sweatshirt=No payment_method=Stadium Cash
dollars_spent=40 H=. i=5 gender=Male age=25 Marital_Status=Married Game_Attendance=Attends Less Than 50% of Games games_attended=1
Age_bracket=20s segment= _ERROR_=1 _N_=14
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=13296 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=Yes Bought_Sweatshirt=Yes payment_method=Stadium Cash
dollars_spent=220 H=. i=5 gender=Male age=49 Marital_Status=Married Game_Attendance=Attends Less Than 50% of Games games_attended=1
Age_bracket=40s segment=All Merch _ERROR_=1 _N_=15
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=13308 Bought_Team_Hat=Yes Bought_Team_TShirt=No Bought_Other_Merch=Yes Bought_Sweatshirt=Yes payment_method=Stadium Credit
dollars_spent=190 H=. i=5 gender=Female age=54 Marital_Status=Single Game_Attendance=Attends Less Than 50% of Games games_attended=1
Age_bracket=50s segment= _ERROR_=1 _N_=16
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=13408 Bought_Team_Hat=Yes Bought_Team_TShirt=No Bought_Other_Merch=No Bought_Sweatshirt=No payment_method=Stadium Cash
dollars_spent=40 H=. i=5 gender=Male age=68 Marital_Status=Married Game_Attendance=Attends Less Than 50% of Games games_attended=1
Age_bracket=60s segment= _ERROR_=1 _N_=17
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=13443 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=Yes Bought_Sweatshirt=Yes payment_method=Team Website
dollars_spent=280 H=. i=5 gender=Male age=45 Marital_Status=Single Game_Attendance=Attends Less Than 50% of Games games_attended=1
Age_bracket=40s segment=All Merch _ERROR_=1 _N_=18
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
Fan_ID=13668 Bought_Team_Hat=Yes Bought_Team_TShirt=Yes Bought_Other_Merch=No Bought_Sweatshirt=Yes payment_method=Stadium Cash
dollars_spent=155 H=. i=5 gender=Female age=52 Marital_Status=Single Game_Attendance=Attends Less Than 50% of Games games_attended=1
Age_bracket=50s segment= _ERROR_=1 _N_=19
NOTE: Invalid numeric data, 'Some Merch' , at line 67 column 10.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
Fan_ID=13880 Bought_Team_Hat=Yes Bought_Team_TShirt=No Bought_Other_Merch=Yes Bought_Sweatshirt=Yes payment_method=Team Website
dollars_spent=190 H=. i=5 gender=Male age=48 Marital_Status=Single Game_Attendance=Attends Less Than 50% of Games games_attended=1
Age_bracket=40s segment= _ERROR_=1 _N_=20
NOTE: There were 473 observations read from the data set PROFOOT.PRO_FOOTBALL_GROUPS.
NOTE: The data set PROFOOT.PRO_FOOTBALL_SEGMENTS has 10 observations and 16 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.02 seconds
69
70 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
83
I should have over 400 observations.
@bldudley wrote:
Hello,
I am adding my code and log information
LOG: 1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 61 62 data profoot.pro_football_segments; 63 set profoot.pro_football_groups; 64 label segment = 'Customer Purchase Segments'; 65 if bought_other_merch='Yes' and bought_sweatshirt='Yes' and bought_team_hat='Yes' and bought_team_tshirt='Yes' then 65 ! segment='All Merch'; 66
I should have over 400 observations.
The first time you reference the variable SEGMENT was in the LABEL statement. Since a type had not been assigned prior to that then SAS assumend that SEGMENT was going to be a numeric variable. "Some merch" is not valid for numeric.
Add
Length segment $ 20;
before the label statement to declare the variable as character.
Thank you. That did help.
However, I'm still a little stuck.
I'm supposed to have 473 observations. I changed the second line from "if" to Else "if" and I had more observations show, but still not all 473. Any ideas?
Code:
data profoot.pro_football_segments;
set profoot.pro_football_groups;
Length segment $ 15;
label segment = 'Customer Purchase Segments';
if bought_other_merch='Yes' and bought_sweatshirt='Yes' and bought_team_hat='Yes' and bought_team_tshirt='Yes' then segment='All Merch';
else if bought_other_merch='No' and bought_sweatshirt='No' and bought_team_hat='No' and bought_team_tshirt='No' then segment='No Merch';
Else if 'Some Merch';
run;
Log after running that data step with length format included:
The ELSE is incorrect, it should be:
Else segment = 'Some Merch';
Yes! Thank you SO much.
This makes sense to me now.
And another idea:
data profoot.pro_football_segments; set profoot.pro_football_groups; total_merch= sum(bought_other_merch='Yes',bought_sweatshirt='Yes',bought_team_hat='Yes',bought_team_tshirt='Yes'); /* typically at this point I would create a custom format for the total and be done*/ length segment $ 20; select (total_merch); when (0) segment='No merch'; when (4) segment='All merch'; otherwise segment='Some merch'; end; run;
The Total_merch numeric may be useful other purposes.
A format like this could be used to format total_merch for analysis or report purposes.
proc format library=work; value total_merch 0,. = 'No merch' 4 = 'All merch' 1,2,3 = 'Some merch' ; run;
Formats, as @ballardw says, make your life easier.
In addition, instead of having Yes or No as the variable's values, if you use 1 or 0 instead of Yes or No, it's a lot less typing because all you have to do is add the four variables together with the SUM function.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.