DATA Step, Macro, Functions and more

Creating a new column based on the frequency of a variable

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Creating a new column based on the frequency of a variable

Hi,
 
I am using sas 9.4 base
 
 
I want to measure the frequency of a each variable within a column (similar to proc freq) and then create a second column which states the frequency nxt to ac instance of the variable.
 
The data is from Facebook and has an individual row per action (like, comment, share) with an actor Id column (so if the actor has liked a post more than once it is on a separate row). So I want to rank each actor by the number of actions that he/she has taken.
 
This can be done with a proc freq report but I'm not sure how to create a column stating how many actions each actor has taken on every row.
 
Further it would be awesome if the time between action (each row is an action) can be measured on the actor level.... pls see the attached screenshot
 
Any help would be greatly appreciated! 
 
-Harrison

data screenshot.png

Accepted Solutions
Solution
‎02-16-2017 09:32 AM
Valued Guide
Posts: 797

Re: Creating a new column based on the frequency of a variable

Can you post what the input data looks like in SAS?  And what you want the output data to look like?

 

I think you want for every record, two new variables

  1. actor_freq:   The overall frequency of the actorid, taken from proc freq output
  2. time_lapse:     Time since last action by the same actorid

Absent your posting testable date ...

Assuming your data are sorted by the time stamp UPDATEDDATE, then this untested code would be a useful approach:  It uses hash objects name FREQ (taken from a data set made by proc freq) and PRIOR (built dynamically during the "DATA WANT" step):

 

proc freq data=have  noprint;
  tables actorid / out=lookupfreqs (keep=actorid count  rename=(count=actor_freq));
run;

data want;
  set have;
  if _n_=1 then do;
    if 0 then set lookupfreqs;
      declare hash freq(dataset:'lookupfreqs');
        freq.definekey('actorid');
        freq.definedata('actor_freq');
        freq.definedone();

    format prior_update datetime20.;
      declare hash prior();
        prior.definekey('actorid');
        prior.definedata('prior_update');
        prior.definedone();
  end;

  rc=freq.find();

  rc=prior.find();
  if rc=0 then time_lapse=updateddate - prior_update;
  prior_update=updateddate;
  rc=prior.replace();
run;

 

Notes:

  1. I repeat, dataset HAVE is assumed to be sorted by time stamp.
  2. You want to retrieve the frequency of total records by actorid, not by actorid/action_type.  That's what the hash object FREQS holds, taken directly from proc freq output.
  3. UPDATEDDATE is a sas datetime value.
  4. You want the time_lapse since the last action of any type by the same actorid.
  5. The hash object PRIOR, unlike FREQ, is dynamically built.  When an actoirid is enountered for the first time, the statement "rc=prior.find()" will generate a non-zero rc, so no time_lapse can be calculated.  But sebsequent find's for the same id will succeed (rc=0).  This object is meant to hold one item per actorid, always containing the time stamp of the most recent prior action of that actorid. 

One could also re-sort the dataset by actorid/updateddate, and then more easily (i.e. without a hash object) get time_lapse in a data step, but I assume your data would be large enough to make a proc sort costly.

View solution in original post


All Replies
Super User
Posts: 17,832

Re: Creating a new column based on the frequency of a variable

Use proc freq to get counts and then transpose to reshape the data to the form you want. 

 

New Contributor
Posts: 3

Re: Creating a new column based on the frequency of a variable

Can you post the code?

 

Solution
‎02-16-2017 09:32 AM
Valued Guide
Posts: 797

Re: Creating a new column based on the frequency of a variable

Can you post what the input data looks like in SAS?  And what you want the output data to look like?

 

I think you want for every record, two new variables

  1. actor_freq:   The overall frequency of the actorid, taken from proc freq output
  2. time_lapse:     Time since last action by the same actorid

Absent your posting testable date ...

Assuming your data are sorted by the time stamp UPDATEDDATE, then this untested code would be a useful approach:  It uses hash objects name FREQ (taken from a data set made by proc freq) and PRIOR (built dynamically during the "DATA WANT" step):

 

proc freq data=have  noprint;
  tables actorid / out=lookupfreqs (keep=actorid count  rename=(count=actor_freq));
run;

data want;
  set have;
  if _n_=1 then do;
    if 0 then set lookupfreqs;
      declare hash freq(dataset:'lookupfreqs');
        freq.definekey('actorid');
        freq.definedata('actor_freq');
        freq.definedone();

    format prior_update datetime20.;
      declare hash prior();
        prior.definekey('actorid');
        prior.definedata('prior_update');
        prior.definedone();
  end;

  rc=freq.find();

  rc=prior.find();
  if rc=0 then time_lapse=updateddate - prior_update;
  prior_update=updateddate;
  rc=prior.replace();
run;

 

Notes:

  1. I repeat, dataset HAVE is assumed to be sorted by time stamp.
  2. You want to retrieve the frequency of total records by actorid, not by actorid/action_type.  That's what the hash object FREQS holds, taken directly from proc freq output.
  3. UPDATEDDATE is a sas datetime value.
  4. You want the time_lapse since the last action of any type by the same actorid.
  5. The hash object PRIOR, unlike FREQ, is dynamically built.  When an actoirid is enountered for the first time, the statement "rc=prior.find()" will generate a non-zero rc, so no time_lapse can be calculated.  But sebsequent find's for the same id will succeed (rc=0).  This object is meant to hold one item per actorid, always containing the time stamp of the most recent prior action of that actorid. 

One could also re-sort the dataset by actorid/updateddate, and then more easily (i.e. without a hash object) get time_lapse in a data step, but I assume your data would be large enough to make a proc sort costly.

New Contributor
Posts: 3

Re: Creating a new column based on the frequency of a variable

Thanks for your help! Your solution worked perfectly

Super User
Posts: 10,500

Re: Creating a new column based on the frequency of a variable

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

Respected Advisor
Posts: 3,777

Re: Creating a new column based on the frequency of a variable

Your data is a picture.  You need to post data that community members interested in helping you can use in a program.  For example.

 

data fb;
   infile cards dsd missover;
   input (list of variable names)(informats);
   cards;
value,value,value,...
value,value,value,...
value,value,value,..
;;;;
   run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 154 views
  • 0 likes
  • 5 in conversation