BookmarkSubscribeRSS Feed
KDang
Fluorite | Level 6
Hi,
I have a question about data step logic.

I have a file of user activity by month/year.
id month activity
0001 01Jan09 4
0001 01Mar09 3
0001 01Jul09 1
0002 01Feb09 2
0002 01Apr09 2
0003 01Dec10 3
...
Notes:
1. file only shows activity by month by id, so if id was inactive then activity = 0 for that month (i.e for 0001 activity =0 for Feb, Apr, May etc..)
2. Data goes back to 2007 to current month

What I want to achieve is a data set that will fill in the inactive holes for all ids.
id month activity
0001 01Jan09 4
0001 01Feb09 0
0001 01Mar09 3
0001 01Apr09 0 .... current month
0002 01Jan09 0
0002 01Feb09 2 .... so on..

I thought about creating a data set with just dates and using a left merge, not sure if that will work..
If anyone has a solution to how I should tackle this that would be great!

Thanks for the help!
6 REPLIES 6
Cynthia_sas
SAS Super FREQ
Hi:
Your data for all 3 IDs spans 2 years: 2009 and 2010. So when you "fill" in the inactive months, would you want to see all 0 in 2010 months for ID=0001, for example??? And you would want to see all 0 in 2009 months for ID=0003???

cynthia
KDang
Fluorite | Level 6
Yes, ideally thats what i'd like to see, but I understand it may take of time and processing power with 3 years data and 100k+ ids...

Now that I think about it, reversely, if I have 2 datasets, one that shows activity (already have), and the other with inactivity (all the months of activity = 0 for all id's), I might be able to do something with that..
Cynthia_sas
SAS Super FREQ
Hi:
Actually, although you could do this with 2 datasets and a merge, as you envision, you can also do it with a user-defined format and an option called "PRELOADFMT". PROC REPORT, PROC TABULATE and PROC MEANS all support PRELOADFMT. You can generate the PROC FORMAT step from a program if you want to make sure that the format always ends with the current month. I just built the format manually for the sake of the example.

The program below shows PROC REPORT and PROC MEANS methods of using PRELOADFMT.

cynthia

[pre]
data id_act;
infile datalines;
input id month : date7. activity;
return;
datalines;
0001 01Jan09 4
0001 01Mar09 3
0001 01Jul09 1
0002 01Feb09 2
0002 01Apr09 2
0003 01Dec10 3
0004 01Dec10 4
0004 01Jan11 5
;
run;

** if you build the format with a program, you can always end the format on the current month;
proc format;
value monfmt '01Jan09'd = '01Jan09'
'01Feb09'd = '01Feb09'
'01Mar09'd = '01Mar09'
'01Apr09'd = '01Apr09'
'01May09'd = '01May09'
'01Jun09'd = '01Jun09'
'01Jul09'd = '01Jul09'
'01Aug09'd = '01Aug09'
'01Sep09'd = '01Sep09'
'01Oct09'd = '01Oct09'
'01Nov09'd = '01Nov09'
'01Dec09'd = '01Dec09'
'01Jan10'd = '01Jan10'
'01Feb10'd = '01Feb10'
'01Mar10'd = '01Mar10'
'01Apr10'd = '01Apr10'
'01May10'd = '01May10'
'01Jun10'd = '01Jun10'
'01Jul10'd = '01Jul10'
'01Aug10'd = '01Aug10'
'01Sep10'd = '01Sep10'
'01Oct10'd = '01Oct10'
'01Nov10'd = '01Nov10'
'01Dec10'd = '01Dec10'
'01Jan11'd = '01Jan11' ;
run;

options missing=0;
proc report data=id_act nowd out=work.repout completerows;
title 'PROC REPORT uses PRELOADFMT and COMPLETEROWS';
title2 'Only using PROC REPORT to build dataset';
column id month activity;
define id / group f=z4.;
define month / group f=monfmt. order=internal preloadfmt;
define activity/ sum;
run;

proc print data=work.repout;
title 'Dataset from PROC REPORT';
format month date7. id z4.;
run;


proc means data=id_act sum nway completetypes;
title 'PROC MEANS uses PRELOADFMT and COMPLETETYPES';
title2 'Only using PROC MEANS to build dataset';
class id;
class month / preloadfmt;
var activity;
format month monfmt. id z4.;
output out=work.mnout(drop=_type_ _freq_) sum=activity;
run;

proc print data=work.mnout;
title 'Dataset from PROC MEANS';
run;
[/pre]
deleted_user
Not applicable
hello,

i just want to bring to your attention a program for building the format which can be easily put in a macro with two parameters start and end:

filename test "your_location\format.txt";

data _null_;
file test;

start='01jan09'd;
end=mdy(month(date()),1,year(date()));
dif=intck('month',start,end);

do i=1 to dif;

if i=1 then put "proc format;" /
"value monfmt" "'" start date7. +(-0) "'d='" start date7. "'";

else do;
b=intnx('month',start,1,'same');
put "'" b date7. +(-0) "'d='" b date7. "'";
start=b;
end;

if i=dif then put ";run;";
end;

run;

%include "your_location\format.txt";

Marius
Ksharp
Super User
[pre]
data have;
input id $ month : date9. activity ;
format month date9.;
cards;
0001 01Jan09 4
0001 01Mar09 3
0001 01Jul09 1
0002 01Feb09 2
0002 01Apr09 2
0003 01Dec10 3
;
run;
proc sort data=have;
by id month;
run;
%let start=01jan09;
%let end=01dec09;
data temp;
set have;
_month="&start"d;
if id ne lag(id) then do;
do i=1 to intck('month',"&start"d,"&end"d)+1;
_id=id; _activity=0; output;
_month=intnx('month',_month,1,'sameday');
end;
end;

format _month date9.;
keep _id _month _activity;
run;
proc sort data=temp(rename=(_id=id _month=month _activity=activity));
by id month;
run;
data want;
merge temp have;
by id month;
run;
[/pre]



Ksharp
KDang
Fluorite | Level 6
Thank you for the help, problem solved.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 906 views
  • 0 likes
  • 4 in conversation