Hello,
I am trying to modify a dataset and change it's format. How do I add the appropriate line breaks described below?
Source data:
Trial | Timestep | Var1 | Var2 | Var3 | Var4 | Var5 | Var6 |
1 | 0 | 2 | 2 | 2 | 3 | 4 | 4 |
1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 |
1 | 2 | 2 | 2 | 3 | 3 | 4 | 4 |
1 | 3 | 3 | 3 | 3 | 4 | 4 | 5 |
1 | 4 | 3 | 3 | 3 | 4 | 4 | 5 |
This spans across 9 "varx" variables. For each trial, there are 601 (0-600) timesteps. After 600 timesteps trial is set to 2 and timestep is reset to 0.
The data needs to be put in the following format (yes, I know the values for Var1-5 don't match here, no problem)
0001 | |||||||||
1116 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 |
1216 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 3 |
0117 | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 3 |
. | |||||||||
1166 | |||||||||
END | |||||||||
0002 | |||||||||
1116 | 1 | 1 | 2 | 2 | 3 | 3 | 3 | 4 | 5 |
1216 | 1 | 1 | 2 | 2 | 3 | 3 | 4 | 4 | 5 |
0117 | 1 | 1 | 2 | 2 | 3 | 3 | 4 | 4 |
5 |
The "0001" represents the first trial, below which is a date (mmyy) that goes through all 600 timesteps, at which point you see the "END" and "0002", and then repeats. 1116 would represent timestep 0, 1216 represents timestep 1, 0117 represents timestep 2, etc.
I've created a new indicator column which will represent the left most column in my second data set. I plan to add the "0001" at the top after the rest of the "0xxx" are successfully generated. Currently, I'm able to add the "END" in the correct places. I'm able to place a "0002" in properly as well but a 0002 is being placed after ALL of the "END"s, rather than being increased monotonically. Heres the code for END and "0xxx" insertion. I only have j running from 2 to 4 now as I'm testing with a subset.
data want;
set have;
output;
if timestep = 600;
array allnums {*} _numeric_ ;
array allchar {*} _character_ ;
drop i;
do i=1 to dim(allnums); allnums{i}=.; end;
do i=1 to dim(allchar); allchar{i}='END'; end;
output; /* Output blank observation */
run;
data want1;
set want;
output;
do j = 2 to 4 by 1 while(indicator = 'END');
array allnums {*} _numeric_ ;
array allchar {*} _character_ ;
drop i;
do i=1 to dim(allnums); allnums{i}=.; end;
do i=1 to dim(allchar); allchar{i}= "&j."; end;
output; /* Output blank observation */
j=j+1;
end;
run;
Any help would be greatly appreciated. Working on SAS EG5.1
Sure thing! Totally revamped the previous code
%Let YYMM = %sysfunc(mdy(11,01,2016)); *Start Date;
/*Add LineBreak*/
Data Work.LineBreak;
length Indicator $20.;
Set Have;
By Trial;
If First.Trial then do;
Indicator = cats('Scenario',Trial);
Output;
Indicator = put(&YYMM,mmyyn4.);
Output;
End;
Else if not First.Trial and not Last.Trial then do;
Indicator = put(intnx('month',&YYMM,Timestep),mmyyn4.);
Output;
End;
Else if Last.Trial then do;
Indicator = put(intnx('month',&YYMM,Timestep),mmyyn4.);
Output;
Indicator = 'END';
Output;
End;
Run;
/*Clear LineBreaks*/
Data Work.ClearBreak(drop=Trial Timestep);
Set Work.LineBreak;
Array AllNumVars _numeric_;
If substr(Indicator,1,8) = 'Scenario' or Indicator = 'END' then do;
Do over AllNumVars;
AllNumVars = .;
End;
End;
if substr(Indicator,1,8) = 'Scenario' then do;
indicator = substr(Indicator,9,4);
End;
if Indicator not = 'END' then do;
if length(Indicator) lt 4 then Indicator = cats(repeat('0',4-1-length(Indicator)),Indicator);
end;
run;
Is this the sort of thing you're after?
data have;
input Trial Timestep Var1 Var2 Var3 Var4 Var5 Var6;
cards;
1 0 2 2 2 3 4 4
1 1 2 2 2 3 3 4
1 2 2 2 3 3 4 4
1 3 3 3 3 4 4 5
1 4 3 3 3 4 4 5
2 0 2 2 2 3 4 4
2 1 2 2 2 3 3 4
2 2 2 2 3 3 4 4
2 3 3 3 3 4 4 5
2 4 3 3 3 4 4 5
3 0 2 2 2 3 4 4
3 1 2 2 2 3 3 4
3 2 2 2 3 3 4 4
3 3 3 3 3 4 4 5
3 4 3 3 3 4 4 5
run;
proc format;
value tsf
0 = "1116"
1 = "1216"
2 = "0117"
3 = "0217"
4 = "1166"
;
run;
data want;
length CharTrial $4;
set have;
by Trial;
drop Trial TimeStep Var1-Var6;
if first.Trial then do;
call missing(Out1, Out2, Out3, Out4, Out5, Out6);
CharTrial = put(Trial, z4.);
output;
end;
CharTrial = put(TimeStep, tsf.);
Out1 = Var1; Out2 = Var2; Out3 = Var3; Out4 = Var4; Out5 = Var5; Out6 = Var6;
output;
if last.Trial
then do;
CharTrial = "END";
call missing(Out1, Out2, Out3, Out4, Out5, Out6);
output;
end;
run;
Not sure we're on the same page, sorry if I wasn't clear. Taking the data you gave me and added a couple rows to show that timestep goes to 600;
Trial | Timestep | Var1 | Var2 | Var3 | Var4 | Var5 | Var6 |
1 | 0 | 2 | 2 | 2 | 3 | 4 | 4 |
1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 |
1 | 2 | 2 | 2 | 3 | 3 | 4 | 4 |
1 | 3 | 3 | 3 | 3 | 4 | 4 | 5 |
1 | 4 | 3 | 3 | 3 | 4 | 4 | 5 |
1 | … | … | … | … | … | … | … |
1 | … | … | … | … | … | … | … |
1 | 600 | 4 | 4 | 5 | 5 | 5 | 6 |
2 | 0 | 2 | 2 | 2 | 3 | 4 | 4 |
2 | 1 | 2 | 2 | 2 | 3 | 3 | 4 |
2 | 2 | 2 | 2 | 3 | 3 | 4 | 4 |
2 | 3 | 3 | 3 | 3 | 4 | 4 | 5 |
2 | 4 | 3 | 3 | 3 | 4 | 4 | 5 |
2 | … | … | … | … | … | … | … |
2 | … | … | … | … | … | … | … |
2 | 600 | 5 | 6 | 6 | 6 | 7 | 7 |
3 | 0 | 2 | 2 | 2 | 3 | 4 | 4 |
3 | 1 | 2 | 2 | 2 | 3 | 3 | 4 |
3 | 2 | 2 | 2 | 3 | 3 | 4 | 4 |
3 | 3 | 3 | 3 | 3 | 4 | 4 | 5 |
3 | 4 | 3 | 3 | 3 | 4 | 4 | 5 |
In the current case,
0 = "1116"
1 = "1216"
2 = "0117"
3 = "0217"
4 = "1166"
is correct, but it needs to be flexible so that I can set some variable, e.g. "CurrentDate", to 1116, 0219, or any date in mmyy form and It will generate the appropriate date, increasing monthly, by timestep. Ideally, the program would also be flexible to take different values for number of trials and number of timesteps per trial. Final format for the above data would be:
0001 | ||||||
1116 | 2 | 2 | 2 | 3 | 4 | 4 |
1216 | 2 | 2 | 2 | 3 | 3 | 4 |
0117 | 2 | 2 | 3 | 3 | 4 | 4 |
0217 | 3 | 3 | 3 | 4 | 4 | 5 |
0317 | 3 | 3 | 3 | 4 | 4 | 5 |
… | … | … | … | … | … | … |
… | … | … | … | … | … | … |
1166 | 4 | 4 | 5 | 5 | 5 | 6 |
END | ||||||
0002 | ||||||
1116 | 2 | 2 | 2 | 3 | 4 | 4 |
1216 | 2 | 2 | 2 | 3 | 3 | 4 |
0117 | 2 | 2 | 3 | 3 | 4 | 4 |
0217 | 3 | 3 | 3 | 4 | 4 | 5 |
0317 | 3 | 3 | 3 | 4 | 4 | 5 |
… | … | … | … | … | … | … |
… | … | … | … | … | … | … |
1166 | 5 | 6 | 6 | 6 | 7 | 7 |
END | ||||||
0003 | ||||||
1116 | 2 | 2 | 2 | 3 | 4 | 4 |
1216 | 2 | 2 | 2 | 3 | 3 | 4 |
0117 | 2 | 2 | 3 | 3 | 4 | 4 |
0217 | 3 | 3 | 3 | 4 | 4 | 5 |
0317 | 3 | 3 | 3 | 4 | 4 | 5 |
… | … | … | … | … | … | … |
… | … | … | … | … | … | … |
1166 | … | … | … | … | … | … |
END |
Thanks
Solved, thanks.
Great news that you solved the problem, jbscholten! Can you share your solution for other community members who may run into the same issue?
Thanks!
Anna
Sure thing! Totally revamped the previous code
%Let YYMM = %sysfunc(mdy(11,01,2016)); *Start Date;
/*Add LineBreak*/
Data Work.LineBreak;
length Indicator $20.;
Set Have;
By Trial;
If First.Trial then do;
Indicator = cats('Scenario',Trial);
Output;
Indicator = put(&YYMM,mmyyn4.);
Output;
End;
Else if not First.Trial and not Last.Trial then do;
Indicator = put(intnx('month',&YYMM,Timestep),mmyyn4.);
Output;
End;
Else if Last.Trial then do;
Indicator = put(intnx('month',&YYMM,Timestep),mmyyn4.);
Output;
Indicator = 'END';
Output;
End;
Run;
/*Clear LineBreaks*/
Data Work.ClearBreak(drop=Trial Timestep);
Set Work.LineBreak;
Array AllNumVars _numeric_;
If substr(Indicator,1,8) = 'Scenario' or Indicator = 'END' then do;
Do over AllNumVars;
AllNumVars = .;
End;
End;
if substr(Indicator,1,8) = 'Scenario' then do;
indicator = substr(Indicator,9,4);
End;
if Indicator not = 'END' then do;
if length(Indicator) lt 4 then Indicator = cats(repeat('0',4-1-length(Indicator)),Indicator);
end;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.