BookmarkSubscribeRSS Feed
matoma
Obsidian | Level 7

Picture1.pngYou are a User Experience Researcher for Wikipedia media lab. Your supervisor needs a log analysis using the wikidataset. The wiki dataset logged each user’s activity modifying Wikipedia webpages. The User variable indicates the username or user’s IP. The Name variable indicates the webpage they modified. The changes variable indicates the number of places on that webpage they have made changes.

  1. Please create a macro that can be used generally to transpose the long format dataset to a wide format dataset by person, by year, and by month. The macro needs to accommodate the scenario where different users have different numbers of entries.     

I'm obviously not asking for anyone to do this for me I'm just very confused as to where to even start so if anyone has any advice I would really appreciate it. I attached the question as well as a snapshot of what the data looks like. 

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Homework assignment ... yuk. Why do you need a macro, why doesn't PROC TRANSPOSE without a macro do the job? Why would you need to transpose long to wide (which is generally not a good thing to do)?

 

The first step in any macro writing is to get code that works without the macros. PROC TRANSPOSE does the job, get that to work. After that, it's not obvious what a macro would do, that isn't explained.

 

Or you could tell your professor that people have already written a %TRANSPOSE macro, so you are happy to use that, and you won't be writing your own macro.

--
Paige Miller
ballardw
Super User

One place to start would be to decide what the RESULT looks like. The result would likely be driven by want you want to do for analysis or reporting, which was not mentioned at all.

The role of the Name is not mentioned at all. So is that to be kept or discarded.

 

If the purpose of the exercise is to determine how many changes a user made per month since an explicit requirement of a data set is not stated I would be strongly tempted to go direct to a reporting procedure such a Proc Tabulate or Report.

Assigning an appropriate format to the date(time variable) would create a year/month display and use that as an Across variable in Proc Report or column expression in Proc tabulate and then summarize the Changes variable per period as a sum.

But if the Name is to be used, how should be defined now as transposing a mix of numeric and character variables can be tricky (and REALLY ugly in a wide form).

art297
Opal | Level 21

@ballardw Transposing a combination of character and numeric isn't tricky with the %transpose macro. Additionally, besides not being tricky, the macro keeps character as character, numeric as numeric, as well as all formats.

 

E.g., 

filename tr url 'https://raw.githubusercontent.com/art297/transpose/master/transpose.sas';
%include tr ;

%transpose(data=sashelp.failure, out=want, by=process, Guessingrows=1000)

Art, CEO, AnalystFinder.com

 

mkeintz
PROC Star

 

I'm a little confused too.    In particular, I would be mighty curious in knowing what "log analysis" the putative supervisor would do (and how they would do it!) using the resulting irregularly wide transposed format.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
matoma
Obsidian | Level 7

I might be approaching this all wrong, but I attached the code I started writing. Everything up until the proc transpose is working and I'm not sure how to get it to work. I'll attach what the data looks like up until this point. 

data hw3.wikib;

set hw3.wiki;

date_new=put(datepart(date),yymon7.);

run;

 

proc sort data = hw3.wikib out=hw3.wikib;  

by user descending date_new ;  

run; 

 

data hw3.wikib;

set hw3.wikib;

by user;

if first.user then id + 1;

run;

 

data _null_;

set hw3.wikib;

call symputx ('user'||left(id),user);

run;

 

%let user=1;

proc transpose data=hw3.wikib out=hw3.wiki2 prefix=date_new;

by &user;

id date_new;

var changes;

run;

Picture1.png

art297
Opal | Level 21

It would help more if you attached the actual sas dataset rather than just a picture of it.

 

Also, in your proc transpose code you use "by &user.;" or, given the value you assigned to the macro variable "user", "by 1;". That won't work. Do you want one wide record for each value of user, or something else?

 

Art, CEO, AnalystFinder.com

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 817 views
  • 3 likes
  • 6 in conversation