DATA Step, Macro, Functions and more

Base sas

Reply
Contributor
Posts: 23

Base sas

hi
I want to create two variables with buying status. i want to create my dataset to look like this. How many ways we can create a dataset like below. Please write diiferent ways to get the table.
 
 
brand              total_bought        burberry_bought                valentino_bought
burberry           yes                             yes                                      no
valentino          yes                             no                                      yes
valentino1          yes                             no                                      yes
Super User
Posts: 17,863

Re: Base sas

With what purpose? Are you creating this from scratch and need to know how? Or have a dataset and need to create new variables? Or is this theoretical and you need to know how many different ways for a course? 

PROC Star
Posts: 552

Re: Base sas

[ Edited ]

Providing some sample data and some example of what your desired result looks like would help Smiley Happy

 

I am going to assume that your dataset looks like this

 

data have;
 format brand $15.;
 input brand $ total_bought $;
 datalines;
burberry yes
valentino yes
valentino1 yes
;

 

 And that you want to create the variables burberry_bought and valentino_bought as below

 

 

data want;
   set have;
   if index(brand, 'burberry') > 0 then burberry_bought = 'yes'; else burberry_bought = 'no';
   if index(brand, 'valentino') > 0 then valentino_bought = 'yes'; else valentino_bought = 'no';
run;

 

Hope it helps Smiley Happy

 

 

Super User
Posts: 10,516

Re: Base sas

As a minor change to @draycut's solution I would suggest:

data want;
   set have;
   burberry_bought  = (index(UPCASE(brand), 'BURBERRY')) > 0;
   valentino_bought = index(UPCASE(brand), 'VALENTINO') > 0;
run;

This will assign values of 1 for true and 0 for false. If you really need to show text Yes/No then a custom format can be assigned. The 1/0 coding lends itself to summaries much better as the SUM of the bought variable will be the total times bought, the MEAN will be a percentage in decimal form. Also if you actual data has one field with potentially multiple entries, such as "burberry valentino", extending this approach allows you to sum the variables within a record to know how many brands were bought.

 

 

The UPCASE and change in case to the value searched will help in case your data entry has values like Burberry, burBerry and other similar changes is letter case. Since your example data had two different values involving VALENTINO this seems a likely concern.

Ask a Question
Discussion stats
  • 3 replies
  • 162 views
  • 1 like
  • 4 in conversation