DATA Step, Macro, Functions and more

Data restructuring logic

Reply
Contributor
Posts: 27

Data restructuring logic

Hi,
I have a question about data step logic.

I have a file of user activity by month/year.
id month activity
0001 01Jan09 4
0001 01Mar09 3
0001 01Jul09 1
0002 01Feb09 2
0002 01Apr09 2
0003 01Dec10 3
...
Notes:
1. file only shows activity by month by id, so if id was inactive then activity = 0 for that month (i.e for 0001 activity =0 for Feb, Apr, May etc..)
2. Data goes back to 2007 to current month

What I want to achieve is a data set that will fill in the inactive holes for all ids.
id month activity
0001 01Jan09 4
0001 01Feb09 0
0001 01Mar09 3
0001 01Apr09 0 .... current month
0002 01Jan09 0
0002 01Feb09 2 .... so on..

I thought about creating a data set with just dates and using a left merge, not sure if that will work..
If anyone has a solution to how I should tackle this that would be great!

Thanks for the help!
SAS Super FREQ
Posts: 8,862

Re: Data restructuring logic

Hi:
Your data for all 3 IDs spans 2 years: 2009 and 2010. So when you "fill" in the inactive months, would you want to see all 0 in 2010 months for ID=0001, for example??? And you would want to see all 0 in 2009 months for ID=0003???

cynthia
Contributor
Posts: 27

Re: Data restructuring logic

Posted in reply to Cynthia_sas
Yes, ideally thats what i'd like to see, but I understand it may take of time and processing power with 3 years data and 100k+ ids...

Now that I think about it, reversely, if I have 2 datasets, one that shows activity (already have), and the other with inactivity (all the months of activity = 0 for all id's), I might be able to do something with that..
SAS Super FREQ
Posts: 8,862

Re: Data restructuring logic

Hi:
Actually, although you could do this with 2 datasets and a merge, as you envision, you can also do it with a user-defined format and an option called "PRELOADFMT". PROC REPORT, PROC TABULATE and PROC MEANS all support PRELOADFMT. You can generate the PROC FORMAT step from a program if you want to make sure that the format always ends with the current month. I just built the format manually for the sake of the example.

The program below shows PROC REPORT and PROC MEANS methods of using PRELOADFMT.

cynthia

[pre]
data id_act;
infile datalines;
input id month : date7. activity;
return;
datalines;
0001 01Jan09 4
0001 01Mar09 3
0001 01Jul09 1
0002 01Feb09 2
0002 01Apr09 2
0003 01Dec10 3
0004 01Dec10 4
0004 01Jan11 5
;
run;

** if you build the format with a program, you can always end the format on the current month;
proc format;
value monfmt '01Jan09'd = '01Jan09'
'01Feb09'd = '01Feb09'
'01Mar09'd = '01Mar09'
'01Apr09'd = '01Apr09'
'01May09'd = '01May09'
'01Jun09'd = '01Jun09'
'01Jul09'd = '01Jul09'
'01Aug09'd = '01Aug09'
'01Sep09'd = '01Sep09'
'01Oct09'd = '01Oct09'
'01Nov09'd = '01Nov09'
'01Dec09'd = '01Dec09'
'01Jan10'd = '01Jan10'
'01Feb10'd = '01Feb10'
'01Mar10'd = '01Mar10'
'01Apr10'd = '01Apr10'
'01May10'd = '01May10'
'01Jun10'd = '01Jun10'
'01Jul10'd = '01Jul10'
'01Aug10'd = '01Aug10'
'01Sep10'd = '01Sep10'
'01Oct10'd = '01Oct10'
'01Nov10'd = '01Nov10'
'01Dec10'd = '01Dec10'
'01Jan11'd = '01Jan11' ;
run;

options missing=0;
proc report data=id_act nowd out=work.repout completerows;
title 'PROC REPORT uses PRELOADFMT and COMPLETEROWS';
title2 'Only using PROC REPORT to build dataset';
column id month activity;
define id / group f=z4.;
define month / group f=monfmt. order=internal preloadfmt;
define activity/ sum;
run;

proc print data=work.repout;
title 'Dataset from PROC REPORT';
format month date7. id z4.;
run;


proc means data=id_act sum nway completetypes;
title 'PROC MEANS uses PRELOADFMT and COMPLETETYPES';
title2 'Only using PROC MEANS to build dataset';
class id;
class month / preloadfmt;
var activity;
format month monfmt. id z4.;
output out=work.mnout(drop=_type_ _freq_) sum=activity;
run;

proc print data=work.mnout;
title 'Dataset from PROC MEANS';
run;
[/pre]
N/A
Posts: 0

Re: Data restructuring logic

hello,

i just want to bring to your attention a program for building the format which can be easily put in a macro with two parameters start and end:

filename test "your_location\format.txt";

data _null_;
file test;

start='01jan09'd;
end=mdy(month(date()),1,year(date()));
dif=intck('month',start,end);

do i=1 to dif;

if i=1 then put "proc format;" /
"value monfmt" "'" start date7. +(-0) "'d='" start date7. "'";

else do;
b=intnx('month',start,1,'same');
put "'" b date7. +(-0) "'d='" b date7. "'";
start=b;
end;

if i=dif then put ";run;";
end;

run;

%include "your_location\format.txt";

Marius
Super User
Posts: 10,018

Re: Data restructuring logic

[pre]
data have;
input id $ month : date9. activity ;
format month date9.;
cards;
0001 01Jan09 4
0001 01Mar09 3
0001 01Jul09 1
0002 01Feb09 2
0002 01Apr09 2
0003 01Dec10 3
;
run;
proc sort data=have;
by id month;
run;
%let start=01jan09;
%let end=01dec09;
data temp;
set have;
_month="&start"d;
if id ne lag(id) then do;
do i=1 to intck('month',"&start"d,"&end"d)+1;
_id=id; _activity=0; output;
_month=intnx('month',_month,1,'sameday');
end;
end;

format _month date9.;
keep _id _month _activity;
run;
proc sort data=temp(rename=(_id=id _month=month _activity=activity));
by id month;
run;
data want;
merge temp have;
by id month;
run;
[/pre]



Ksharp
Contributor
Posts: 27

Re: Data restructuring logic

Thank you for the help, problem solved.
Ask a Question
Discussion stats
  • 6 replies
  • 224 views
  • 0 likes
  • 4 in conversation