BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jkf91
Calcite | Level 5

dear experts,

Please help me with the following:

2 Data Sets

1. I have three different kinds of dates in a demographic data set called cig.mergedpanel: work_date, restaurant_date and bar_date.

2. I have a purchase data set called cig.panelsample.

I need to count total number of purchases for each householdsin cig.panelsample before each of the three dates: work_date, restaurant_date and bar_date.

data cig.panelsample_b4law;

          SET cig.panelsample;

          by household_id;

STEP1

          if first.household_id then npb4w=0 npb4r=0 npb4b=0;

STEP2

          if cig.panelsample(purchase_date)<cig.mergedpanel(work_date) then npb4w+1;

          if cig.panelsample(purchase_date)<cig.mergedpanel(restaurant_date) then npb4r+1;

          if cig.panelsample(purchase_date)<cig.mergedpanel(bar_date) then npb4b+1;

run;

This wouldn't work for many reasons that are unknown to me (first time user). One obvious reason is that I have multiple statements after then (STEP1).

Also, i'm not sure how to make comparisons between two different data sets without merging the two (STEP2).

any help would be greatly appreciated!

best,

c

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Then, I have to think that your merge didn't create what you think it did.  Your code appears to be correct, other than you probably want to add a statement that only outputs the results for the last record for each household.

Without seeing the data one can only guess.  My first guesses would be that some or all of the dates haven't been read in as SAS dates or that you simply don't have any records that meet the conditions that you specified.

Take a look at the following.  It appears to be doing what you want to do and the only difference between it and your code is the addition of the suggested output statement:

data cig.Merged2;

  informat purchase_date

           work_date

           restaurant_date

           bar_date

           date9.;

  input household_id

        purchase_date

        work_date

        restaurant_date

        bar_date

        ;

  cards;

1 10AUG2011 11AUG2011 12AUG2011 10AUG2011

1 09AUG2011 11AUG2011 12AUG2011 10AUG2011

1 15AUG2011 11AUG2011 12AUG2011 10AUG2011

2 06AUG2011 11AUG2011 12AUG2011 10AUG2011

2 09AUG2011 11AUG2011 12AUG2011 10AUG2011

2 08AUG2011 11AUG2011 12AUG2011 10AUG2011

;

data cig.Merged2_bna_law;

  SET cig.Merged2;

  by household_id;

  if first.household_id then do;

            npb4w=0;

            npb4r=0;

            npb4b=0;

  end;

  if purchase_date<work_date then npb4w+1;

  if purchase_date<restaurant_date then npb4r+1;

  if purchase_date<bar_date then npb4b+1;

  if last.household_id then output;

run;

View solution in original post

11 REPLIES 11
art297
Opal | Level 21

You have to provide more information.

The step 1 part is easy to correct.  It has to be in the form:

if first.household_id then do;

   npb4w=0;

   npb4r=0;

   npb4b=0;

end;

However, regarding step 2, you probably have to merge the two files together, but one can't show you how without knowing the structure of the two files.  Does the mergedpanel file include the same household_id's as the panelsample file or, if not, what does it look like?

jkf91
Calcite | Level 5

thank you, sir. same household_id's.

jkf91
Calcite | Level 5

i will merged the two first. could you please answer as if everything is in the same file?

each household has different work_date, restaurant_date and bar_date.

many thanks,

c

art297
Opal | Level 21

If you merge the two files together (and that can be separately or in the same datastep), then you would just use code something like:

          if purchase_date<work_date then npb4w+1;

          if purchase_date<restaurant_date then npb4r+1;

          if purchase_date<bar_date then npb4b+1;

jkf91
Calcite | Level 5

I've tried the following, but I get 0's for all three variables and all rows.

data cig.Merged2_bna_law;

          SET cig.Merged2;

          by household_id;

          if first.household_id then do;

                    npb4w=0;

                    npb4r=0;

                    npb4b=0;

          end;

          if purchase_date<work_date then npb4w+1;

          if purchase_date<restaurant_date then npb4r+1;

          if purchase_date<bar_date then npb4b+1;

run;

art297
Opal | Level 21

Then, I have to think that your merge didn't create what you think it did.  Your code appears to be correct, other than you probably want to add a statement that only outputs the results for the last record for each household.

Without seeing the data one can only guess.  My first guesses would be that some or all of the dates haven't been read in as SAS dates or that you simply don't have any records that meet the conditions that you specified.

Take a look at the following.  It appears to be doing what you want to do and the only difference between it and your code is the addition of the suggested output statement:

data cig.Merged2;

  informat purchase_date

           work_date

           restaurant_date

           bar_date

           date9.;

  input household_id

        purchase_date

        work_date

        restaurant_date

        bar_date

        ;

  cards;

1 10AUG2011 11AUG2011 12AUG2011 10AUG2011

1 09AUG2011 11AUG2011 12AUG2011 10AUG2011

1 15AUG2011 11AUG2011 12AUG2011 10AUG2011

2 06AUG2011 11AUG2011 12AUG2011 10AUG2011

2 09AUG2011 11AUG2011 12AUG2011 10AUG2011

2 08AUG2011 11AUG2011 12AUG2011 10AUG2011

;

data cig.Merged2_bna_law;

  SET cig.Merged2;

  by household_id;

  if first.household_id then do;

            npb4w=0;

            npb4r=0;

            npb4b=0;

  end;

  if purchase_date<work_date then npb4w+1;

  if purchase_date<restaurant_date then npb4r+1;

  if purchase_date<bar_date then npb4b+1;

  if last.household_id then output;

run;

gxu
Calcite | Level 5 gxu
Calcite | Level 5

I agree with art297.  I tried with simple numbers instead of dates and can see the three variables correctly accumulated within each household.

jkf91
Calcite | Level 5

thank you very much, sir! Smiley Happy

FriedEgg
SAS Employee

When was it changed that these types of group by summarizations no longer required a retain statement containing the summarizing variables?  I tested both ways and results are the same (using SAS under Unix 9.2).  However I remember in previous versions or possible different circumstances where if the retain statement was not used results were incorrect.

data cig.Merged2_bna_law;

  SET cig.Merged2;

  by household_id;

  retain npb4w npb4r npb4b;

  if first.household_id then do;

            npb4w=0;

            npb4r=0;

            npb4b=0;

  end;

  if purchase_date<work_date then npb4w+1;

  if purchase_date<restaurant_date then npb4r+1;

  if purchase_date<bar_date then npb4b+1;

  if last.household_id then output;

run;

art297
Opal | Level 21

When you use the form

npb4w+1

the variable is automatically retained and the summation is equivalent to using the sum function.

FriedEgg
SAS Employee

Ah, that's right, thank you for the clarification.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 4678 views
  • 4 likes
  • 4 in conversation