I have a data set that has two variables (v1, v2) for each month from 2006 to 2017. Every instance where v1 is null, v2 will also be null, and ever time v1 is not null, v2 will also not be null. An example of what the data looks like is;
data test;
input ID jan2006v1 feb2006v1 mar2006v1 jan2006v2 feb2006v2 mar2006v2;
datalines;
1 -1 . . 5 . .
2 . 2 . . 3 .
3 . . 2 . . 5
4 2 -3 1 4 6 5
5 . 1 2 . 4 3
;
run;
(In the datalines statement for some reason the fifth ID number is not going into a separate row when I post this)
v1 represents the difference from the previous value and v2 represents the current value itself. What I am trying to do is create two data sets. The first data set contains all the v2's that are currently 5 or 6 and came from 1,2,3, or 4. The way to determine this is, if you add v1 and v2 for each month, you should get the previous month's value. The first data set desired result should look like;
ID jan2006v1 feb2006v1 mar2006v1 jan2006v2 feb2006v2 mar2006v2
1 -1 . . 5 . .
4 . -3 . . 6 .
The second data set will be all values where v2 is currently 1,2,3, or 4 and came from 5 or 6. The second data set desired result should look like;
ID jan2006v1 feb2006v1 mar2006v1 jan2006v2 feb2006v2 mar2006v2
2 . 2 . . 3 .
4 2 . . 4 . .
5 . 1 2 . 4 3
Both data sets should still contain the all the variables that are in the original data set. If a value does not fit the criteria for that data set, it should be replaced with a null. If there are no values that fit the criteria for the ID number, then the row should be deleted from the data set.
Where I am getting confused is actually is being able to loop through the values for every month and separate them like I would like. My code produces the correct ID, but the thing that I can't figure out is how to apply a null to the columns that don't fit the criteria. Also my code produces duplicate rows if the id fits the criteria more than once in the same data set as it does in out2 feb2006 and mar2006. My incorrect code looks like this:
data out1 out2;
set test;
array v1 {*} jan2006v1--mar2006v1;
array v2 {*} jan2006v2--mar2006v2;
do i=1 to dim(v2);
if v2{i} in (5,6) and (v1{i} + v2{i}) in (1,2,3,4) then output out1;
if v2{i} in (1,2,3,4) and (v1{i} + v2{i}) in (5, 6) then output out2;
end;
run;
(also for some reason when I post this it configures it into one line instead of the way I have it displayed in the editor)
If someone could help me figure out how to get the data in a way that I need it I would be greatly appreciative.
Thanks,
Tom
Hello,
This is not elegant but it works.
It should be possible to write code for this that is much more concise.
options mprint;
PROC DATASETS library=work NOLIST;
delete test: / memtype=DATA;
delete out: / memtype=DATA;
run;
data test;
input ID jan2006v1 feb2006v1 mar2006v1 jan2006v2 feb2006v2 mar2006v2;
datalines;
1 -1 . . 5 . .
2 . 2 . . 3 .
3 . . 2 . . 5
4 2 -3 1 4 6 5
5 . 1 2 . 4 3
;
run;
%MACRO ARRAYv1v2;
array v1{*} jan2006v1--mar2006v1;
array v2{*} jan2006v2--mar2006v2;
%MEND ARRAYv1v2;
/* transpose to get the non-missing "MY v1a v2a" couples */
data test1;
LENGTH ID 8 MY $ 7 i 8 v1a 8 v2a 8;
set test;
%ARRAYv1v2
do i=1 to dim(v2);
v1a = v1{i};
v2a = v2{i};
MY = substr(vname(v1{i}),1,7);
*if NOT (v1a=. AND v2a=.) then output;
if v1a^=. then output;
end;
run;
data test2;
set test1(drop=i);
%ARRAYv1v2
do i=1 to dim(v2);
if substr(vname(v1{i}),1,7) NE MY then do; v1{i}=.; v2{i}=.; end;
end;
run;
data out1 out2;
set test2(drop=i);
%ARRAYv1v2
/*
if v2a in (5,6) and (v1a + v2a) in (1,2,3,4) then output out1;
if v2a in (1,2,3,4) and (v1a + v2a) in (5,6) then output out2;
*/
/* WE NEED i */
do i=1 to dim(v2);
if v2{i} in (5,6) and (v1{i} + v2{i}) in (1,2,3,4) then output out1;
if v2{i} in (1,2,3,4) and (v1{i} + v2{i}) in (5,6) then output out2;
end;
run;
%MACRO out1out2;
%DO k=1 %TO 2;
proc sort data=out&k.; by ID i; run;
/* COLLAPSE rows with same ID */
data out&k.(drop=jan2006v1--mar2006v1 jan2006v2--mar2006v2 i j MY v1a v2a);
set out&k.;
by ID i;
%ARRAYv1v2
array v1b{*} jan2006v1b feb2006v1b mar2006v1b;
array v2b{*} jan2006v2b feb2006v2b mar2006v2b;
retain jan2006v1b feb2006v1b mar2006v1b;
retain jan2006v2b feb2006v2b mar2006v2b;
if first.ID then do j=1 to dim(v2); v1b{j}=.; v2b{j}=.; end;
if v1{i}^=. then v1b{i}=v1{i};
if v2{i}^=. then v2b{i}=v2{i};
if last.ID then output;
run;
%END; /* %DO k=1 to 2; */
%MEND out1out2;
%out1out2
/* end of program */
Cheers,
Koen
This is harder than it looks because some rows meet BOTH criteria and need to be output to both data sets, albeit with different variables made missing. You'll need a way of storing the original values and restoring them after setting the unneeded ones to missing for output. Try this instead:
data test;
input ID jan2006v1 feb2006v1 mar2006v1 jan2006v2 feb2006v2 mar2006v2;
datalines;
1 -1 . . 5 . .
2 . 2 . . 3 .
3 . . 2 . . 5
4 2 -3 1 4 6 5
5 . 1 2 . 4 3
;
run;
data out1 out2;
drop j i;
set test;
array v1 [*] jan2006v1--mar2006v1;
array v2 [*] jan2006v2--mar2006v2;
array _v1 [3] _temporary_;
array _v2 [3] _temporary_;
do i=1 to dim(v1);
_v1[i]=v1[i];
_v2[i]=v2[i];
end;
do i=1 to dim(v2);
if _v2[i] in (5,6) and (_v1[i] + _v2[i]) in (1,2,3,4) then do;
do j=1 to dim(v1);
if not (_v2[i] in (5,6) and (_v1[i] + _v2[i]) in (1,2,3,4)) then call missing (v1[j],v2[j]);
end;
v1[i]=_v1[i];
v2[i]=_v2[i];
output out1;
do j=1 to dim(v1);
v1[i]=_v1[i];
v2[i]=_v2[i];
end;
end;
if v2[i] in (1,2,3,4) and (v1[i] + v2[i]) in (5, 6) then do;
do j=1 to dim(v1);
if not (v2[i] in (1,2,3,4) and (v1[i] + v2[i]) in (5, 6)) then call missing (v1[j],v2[j]);
end;
v1[i]=_v1[i];
v2[i]=_v2[i];
output out2;
do j=1 to dim(v1);
v1[i]=_v1[i];
v2[i]=_v2[i];
end;
end;
end;
run;
Hello,
This is not elegant but it works.
It should be possible to write code for this that is much more concise.
options mprint;
PROC DATASETS library=work NOLIST;
delete test: / memtype=DATA;
delete out: / memtype=DATA;
run;
data test;
input ID jan2006v1 feb2006v1 mar2006v1 jan2006v2 feb2006v2 mar2006v2;
datalines;
1 -1 . . 5 . .
2 . 2 . . 3 .
3 . . 2 . . 5
4 2 -3 1 4 6 5
5 . 1 2 . 4 3
;
run;
%MACRO ARRAYv1v2;
array v1{*} jan2006v1--mar2006v1;
array v2{*} jan2006v2--mar2006v2;
%MEND ARRAYv1v2;
/* transpose to get the non-missing "MY v1a v2a" couples */
data test1;
LENGTH ID 8 MY $ 7 i 8 v1a 8 v2a 8;
set test;
%ARRAYv1v2
do i=1 to dim(v2);
v1a = v1{i};
v2a = v2{i};
MY = substr(vname(v1{i}),1,7);
*if NOT (v1a=. AND v2a=.) then output;
if v1a^=. then output;
end;
run;
data test2;
set test1(drop=i);
%ARRAYv1v2
do i=1 to dim(v2);
if substr(vname(v1{i}),1,7) NE MY then do; v1{i}=.; v2{i}=.; end;
end;
run;
data out1 out2;
set test2(drop=i);
%ARRAYv1v2
/*
if v2a in (5,6) and (v1a + v2a) in (1,2,3,4) then output out1;
if v2a in (1,2,3,4) and (v1a + v2a) in (5,6) then output out2;
*/
/* WE NEED i */
do i=1 to dim(v2);
if v2{i} in (5,6) and (v1{i} + v2{i}) in (1,2,3,4) then output out1;
if v2{i} in (1,2,3,4) and (v1{i} + v2{i}) in (5,6) then output out2;
end;
run;
%MACRO out1out2;
%DO k=1 %TO 2;
proc sort data=out&k.; by ID i; run;
/* COLLAPSE rows with same ID */
data out&k.(drop=jan2006v1--mar2006v1 jan2006v2--mar2006v2 i j MY v1a v2a);
set out&k.;
by ID i;
%ARRAYv1v2
array v1b{*} jan2006v1b feb2006v1b mar2006v1b;
array v2b{*} jan2006v2b feb2006v2b mar2006v2b;
retain jan2006v1b feb2006v1b mar2006v1b;
retain jan2006v2b feb2006v2b mar2006v2b;
if first.ID then do j=1 to dim(v2); v1b{j}=.; v2b{j}=.; end;
if v1{i}^=. then v1b{i}=v1{i};
if v2{i}^=. then v2b{i}=v2{i};
if last.ID then output;
run;
%END; /* %DO k=1 to 2; */
%MEND out1out2;
%out1out2
/* end of program */
Cheers,
Koen
I appreciate all the time and effort that you all put in to help me with this! Thanks, the solution works exactly like I needed. I will try to carry forward what I learned to use in the future.
Thanks for the help,
Tom
It is very easy for IML code.
data test;
input ID jan2006v1 feb2006v1 mar2006v1 jan2006v2 feb2006v2 mar2006v2;
datalines;
1 -1 . . 5 . .
2 . 2 . . 3 .
3 . . 2 . . 5
4 2 -3 1 4 6 5
5 . 1 2 . 4 3
;
run;
proc iml;
use test;
read all var {id};
read all var {jan2006v1 feb2006v1 mar2006v1} into x1[c=vname1];
read all var {jan2006v2 feb2006v2 mar2006v2} into x2[c=vname2];
close;
pre=x1+x2;
idx=loc(element(x2,{5 6}) & element(pre,{1 2 3 4}));
temp=j(nrow(x1),ncol(x1),.);
_temp=j(nrow(x1),ncol(x1),.);
temp[idx]=x1[idx];
_temp[idx]=x2[idx];
_idx=loc(countn(temp,'row') ^= 0);
want1=id[_idx]||temp[_idx,]||_temp[_idx,];
create want1 from want1[c=('id'||vname1||vname2)];
append from want1;
close;
idx=loc(element(x2,{1 2 3 4}) & element(pre,{5 6}));
temp=j(nrow(x1),ncol(x1),.);
_temp=j(nrow(x1),ncol(x1),.);
temp[idx]=x1[idx];
_temp[idx]=x2[idx];
_idx=loc(countn(temp,'row') ^= 0);
want2=id[_idx]||temp[_idx,]||_temp[_idx,];
create want2 from want2[c=('id'||vname1||vname2)];
append from want2;
close;
quit;
proc print data=want1 noobs;run;
proc print data=want2 noobs;run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.