Hello
For each person ID there are values in 10 months (From Jana 2019 till OCT 2019).
The structure of the name is tYYMM.
For each person ID there is also information of CriticalPoint (This is the date in structure YYMM).
For each ID I want to convert values to missing by the following rule:
for each ID there will be follow up of 3 months from 1 month after CriticalPoint.
for example:
ID=1
CriticalPoint=1903
so values in t1904 t1905 t1906 should be non missing and other values should be missing
ID=3
CriticalPoint=1905
so values in t1906 t1907 t1908 should be non missing and other values should be missing
What is the way to do it please?
In real life there is a data set with many rows so should find a clever way to do it
Data ttt;
input IDNumber CriticalPoint t1901 t1902 t1903 t1904 t1905 t1906 t1907 t1908 t1909 t1910 ;
cards;
1 1903 2 3 4 6 8 7 2 3 5 8
2 1902 5 3 1 2 3 4 5 7 6 9
3 1905 2 4 6 8 9 7 5 6 8 3
;
run;
How about
Data ttt;
input IDNumber CriticalPoint t1901 t1902 t1903 t1904 t1905 t1906 t1907 t1908 t1909 t1910 ;
cards;
1 1903 2 3 4 6 8 7 2 3 5 8
2 1902 5 3 1 2 3 4 5 7 6 9
3 1905 2 4 6 8 9 7 5 6 8 3
;
run;
data want(drop=i j);
set ttt;
array t{*} t1901-t1910;
do i = 1 to dim(t);
if CriticalPoint = input(compress(vname(t[i]),, 'kd'), 8.) then leave;
end;
do j = 1 to dim(t);
if not (i+1 <= j <= i+3) then t[j] = .;
end;
run;
Result
IDNumber CriticalPoint t1901 ------- t1910 1 1903 . . . 6 8 7 . . . . 2 1902 . . 1 2 3 . . . . . 3 1905 . . . . . 7 5 6 . .
Dealing with one year starting in JAN makes life easy, as month can be used as a pointer to an array:
Data ttt;
input IDNumber CriticalPoint t1901 t1902 t1903 t1904 t1905 t1906 t1907 t1908 t1909 t1910 ;
cards;
1 1903 2 3 4 6 8 7 2 3 5 8
2 1902 5 3 1 2 3 4 5 7 6 9
3 1905 2 4 6 8 9 7 5 6 8 3
;
run;
data want;
set ttt;
array tx {10} t1901-t1910;
mm = CriticalPoint - 1900;
do i = 1 to 10;
if i < mm or I > mm+2 then call missing(tx(i));
end;
run;
Data wanted;
set ttt;
if CriticalPoint=1902 then do;
t1906=.;t1907=.;t1908=.;t1909=.;t1910=.;
t1901=.;t1902=.;
end;
if CriticalPoint=1903 then do;
t1907=.;t1908=.;t1909=.;t1910=.;
t1901=.;t1902=.;t1903=.;
end;
if CriticalPoint=1905 then do;
t1909=.;t1910=.;
t1901=.;t1902=.;t1903=.;t1904=.;t1905=.;
end;
Run;
I would like to see a more useful way to do it
What do you mean? Both approaches above are useful?
Data have;
input IDNumber CriticalPoint t1901 t1902 t1903 t1904 t1905 t1906 t1907 t1908
t1909 t1910;
cards;
1 1903 2 3 4 6 8 7 2 3 5 8
2 1902 5 3 1 2 3 4 5 7 6 9
3 1905 2 4 6 8 9 7 5 6 8 3
;
run;
data want;
set have;
array t[10] t1901-t1910;
array tmp[3] _temporary_;
do i=1 to dim(t);
do j=1 to 3;
k=criticalpoint+j;
tmp[j]=vvaluex(cats("t", k));
if vname(t[i])=cats("t", k) then
do;
t[i]=tmp[j];
goto missing;
end;
end;
missing:
if vname(t[i]) ne cats("t", k) then
do;
t[i]=.;
end;
end;
drop i--k;
run;
@Ronein wrote:
Hello
For each person ID there are values in 10 months (From Jana 2019 till OCT 2019).
The structure of the name is tYYMM.
For each person ID there is also information of CriticalPoint (This is the date in structure YYMM).
For each ID I want to convert values to missing by the following rule:
for each ID there will be follow up of 3 months from 1 month after CriticalPoint.
for example:
ID=1
CriticalPoint=1903
so values in t1904 t1905 t1906 should be non missing and other values should be missing
ID=3
CriticalPoint=1905
so values in t1906 t1907 t1908 should be non missing and other values should be missing
What is the way to do it please?
In real life there is a data set with many rows so should find a clever way to do it
Data ttt; input IDNumber CriticalPoint t1901 t1902 t1903 t1904 t1905 t1906 t1907 t1908 t1909 t1910 ; cards; 1 1903 2 3 4 6 8 7 2 3 5 8 2 1902 5 3 1 2 3 4 5 7 6 9 3 1905 2 4 6 8 9 7 5 6 8 3 ; run;
Again, if you have a LONG data set rather than a wide data set, this is simple to do. You really don't seem to have grasped this point as we have discussed this many times before.
data want;
set have;
if month<intnx('month',criticalpoint,1,'b') or month>intnx('month',criticalpoint,
4,'b') then call missing(t);
run;
This assumes you have a true SAS date value for CRITICALPOINT, a new variable named MONTH which is also a true SAS date value, and your t-variables are named T. And of course, the analysis after you do this is also simplified with a long data set compared to a wide data set, as we have been discussing.
In addition, if your t-variables cross a year boundary, so they contain t1910 t1911 t1912 t2001 t2002 etc., none of the other solutions work without additional modification, but the long data set solution with actual SAS dates works properly without any additional modification.
As always, the useless wide format makes the code unnecessarily complicated.
Let's beat this monstrosity into shape:
Data ttt;
input IDNumber CriticalPoint t1901 t1902 t1903 t1904 t1905 t1906 t1907 t1908 t1909 t1910;
cards;
1 1903 2 3 4 6 8 7 2 3 5 8
2 1902 5 3 1 2 3 4 5 7 6 9
3 1905 2 4 6 8 9 7 5 6 8 3
;
proc transpose
data=ttt
out=long
;
by idnumber;
var t:;
run;
data long;
set long;
period = input('20'!!substr(_name_,2),yymmn6.);
format period yymmd7.;
drop _name_;
run;
data have;
set ttt;
cp = input('20'!!put(criticalpoint,z4.),yymmn6.);
format cp yymmd7.;
run;
data long;
set long;
period = input('20'!!substr(_name_,2),yymmn6.);
format period yymmd7.;
drop _name_;
run;
data have;
set ttt;
cp = input('20'!!put(criticalpoint,z4.),yymmn6.);
format cp yymmd7.;
keep idnumber cp;
run;
Then, all that's needed is this:
data want;
merge
have
long
;
by idnumber;
if -1 le intck('month',cp,period) le 1;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.