DATA Step, Macro, Functions and more

Base sas

Posts: 28

Base sas

I want to create two variables with buying status. i want to create my dataset to look like this. How many ways we can create a dataset like below. Please write diiferent ways to get the table.
brand              total_bought        burberry_bought                valentino_bought
burberry           yes                             yes                                      no
valentino          yes                             no                                      yes
valentino1          yes                             no                                      yes
Super User
Posts: 24,014

Re: Base sas

With what purpose? Are you creating this from scratch and need to know how? Or have a dataset and need to create new variables? Or is this theoretical and you need to know how many different ways for a course? 

Posts: 1,403

Re: Base sas

[ Edited ]

Providing some sample data and some example of what your desired result looks like would help Smiley Happy


I am going to assume that your dataset looks like this


data have;
 format brand $15.;
 input brand $ total_bought $;
burberry yes
valentino yes
valentino1 yes


 And that you want to create the variables burberry_bought and valentino_bought as below



data want;
   set have;
   if index(brand, 'burberry') > 0 then burberry_bought = 'yes'; else burberry_bought = 'no';
   if index(brand, 'valentino') > 0 then valentino_bought = 'yes'; else valentino_bought = 'no';


Hope it helps Smiley Happy



Super User
Posts: 13,941

Re: Base sas

As a minor change to @draycut's solution I would suggest:

data want;
   set have;
   burberry_bought  = (index(UPCASE(brand), 'BURBERRY')) > 0;
   valentino_bought = index(UPCASE(brand), 'VALENTINO') > 0;

This will assign values of 1 for true and 0 for false. If you really need to show text Yes/No then a custom format can be assigned. The 1/0 coding lends itself to summaries much better as the SUM of the bought variable will be the total times bought, the MEAN will be a percentage in decimal form. Also if you actual data has one field with potentially multiple entries, such as "burberry valentino", extending this approach allows you to sum the variables within a record to know how many brands were bought.



The UPCASE and change in case to the value searched will help in case your data entry has values like Burberry, burBerry and other similar changes is letter case. Since your example data had two different values involving VALENTINO this seems a likely concern.

Ask a Question
Discussion stats
  • 3 replies
  • 1 like
  • 4 in conversation