BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hk1234
Calcite | Level 5
Hi,
 
I am using sas 9.4 base
 
 
I want to measure the frequency of a each variable within a column (similar to proc freq) and then create a second column which states the frequency nxt to ac instance of the variable.
 
The data is from Facebook and has an individual row per action (like, comment, share) with an actor Id column (so if the actor has liked a post more than once it is on a separate row). So I want to rank each actor by the number of actions that he/she has taken.
 
This can be done with a proc freq report but I'm not sure how to create a column stating how many actions each actor has taken on every row.
 
Further it would be awesome if the time between action (each row is an action) can be measured on the actor level.... pls see the attached screenshot
 
Any help would be greatly appreciated! 
 
-Harrison

data screenshot.png
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Can you post what the input data looks like in SAS?  And what you want the output data to look like?

 

I think you want for every record, two new variables

  1. actor_freq:   The overall frequency of the actorid, taken from proc freq output
  2. time_lapse:     Time since last action by the same actorid

Absent your posting testable date ...

Assuming your data are sorted by the time stamp UPDATEDDATE, then this untested code would be a useful approach:  It uses hash objects name FREQ (taken from a data set made by proc freq) and PRIOR (built dynamically during the "DATA WANT" step):

 

proc freq data=have  noprint;
  tables actorid / out=lookupfreqs (keep=actorid count  rename=(count=actor_freq));
run;

data want;
  set have;
  if _n_=1 then do;
    if 0 then set lookupfreqs;
      declare hash freq(dataset:'lookupfreqs');
        freq.definekey('actorid');
        freq.definedata('actor_freq');
        freq.definedone();

    format prior_update datetime20.;
      declare hash prior();
        prior.definekey('actorid');
        prior.definedata('prior_update');
        prior.definedone();
  end;

  rc=freq.find();

  rc=prior.find();
  if rc=0 then time_lapse=updateddate - prior_update;
  prior_update=updateddate;
  rc=prior.replace();
run;

 

Notes:

  1. I repeat, dataset HAVE is assumed to be sorted by time stamp.
  2. You want to retrieve the frequency of total records by actorid, not by actorid/action_type.  That's what the hash object FREQS holds, taken directly from proc freq output.
  3. UPDATEDDATE is a sas datetime value.
  4. You want the time_lapse since the last action of any type by the same actorid.
  5. The hash object PRIOR, unlike FREQ, is dynamically built.  When an actoirid is enountered for the first time, the statement "rc=prior.find()" will generate a non-zero rc, so no time_lapse can be calculated.  But sebsequent find's for the same id will succeed (rc=0).  This object is meant to hold one item per actorid, always containing the time stamp of the most recent prior action of that actorid. 

One could also re-sort the dataset by actorid/updateddate, and then more easily (i.e. without a hash object) get time_lapse in a data step, but I assume your data would be large enough to make a proc sort costly.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
Reeza
Super User

Use proc freq to get counts and then transpose to reshape the data to the form you want. 

 

hk1234
Calcite | Level 5

Can you post the code?

 

mkeintz
PROC Star

Can you post what the input data looks like in SAS?  And what you want the output data to look like?

 

I think you want for every record, two new variables

  1. actor_freq:   The overall frequency of the actorid, taken from proc freq output
  2. time_lapse:     Time since last action by the same actorid

Absent your posting testable date ...

Assuming your data are sorted by the time stamp UPDATEDDATE, then this untested code would be a useful approach:  It uses hash objects name FREQ (taken from a data set made by proc freq) and PRIOR (built dynamically during the "DATA WANT" step):

 

proc freq data=have  noprint;
  tables actorid / out=lookupfreqs (keep=actorid count  rename=(count=actor_freq));
run;

data want;
  set have;
  if _n_=1 then do;
    if 0 then set lookupfreqs;
      declare hash freq(dataset:'lookupfreqs');
        freq.definekey('actorid');
        freq.definedata('actor_freq');
        freq.definedone();

    format prior_update datetime20.;
      declare hash prior();
        prior.definekey('actorid');
        prior.definedata('prior_update');
        prior.definedone();
  end;

  rc=freq.find();

  rc=prior.find();
  if rc=0 then time_lapse=updateddate - prior_update;
  prior_update=updateddate;
  rc=prior.replace();
run;

 

Notes:

  1. I repeat, dataset HAVE is assumed to be sorted by time stamp.
  2. You want to retrieve the frequency of total records by actorid, not by actorid/action_type.  That's what the hash object FREQS holds, taken directly from proc freq output.
  3. UPDATEDDATE is a sas datetime value.
  4. You want the time_lapse since the last action of any type by the same actorid.
  5. The hash object PRIOR, unlike FREQ, is dynamically built.  When an actoirid is enountered for the first time, the statement "rc=prior.find()" will generate a non-zero rc, so no time_lapse can be calculated.  But sebsequent find's for the same id will succeed (rc=0).  This object is meant to hold one item per actorid, always containing the time stamp of the most recent prior action of that actorid. 

One could also re-sort the dataset by actorid/updateddate, and then more easily (i.e. without a hash object) get time_lapse in a data step, but I assume your data would be large enough to make a proc sort costly.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hk1234
Calcite | Level 5

Thanks for your help! Your solution worked perfectly

ballardw
Super User

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

data_null__
Jade | Level 19

Your data is a picture.  You need to post data that community members interested in helping you can use in a program.  For example.

 

data fb;
   infile cards dsd missover;
   input (list of variable names)(informats);
   cards;
value,value,value,...
value,value,value,...
value,value,value,..
;;;;
   run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2594 views
  • 0 likes
  • 5 in conversation