Hello SAS users. I am completely fresh in the field of SAS programming.
I have a rather large dataset, with 143 variables (one for each week of the year, for the years 2016-2018 week 39), named y_1601-y_1652, y_1701-y_1752, and y_1801-y_1839. In each of these weeks there is either a numeric code or a missing value.
What I am interested in is counting the lengths of consecutive occurences of 111 and create a variable with this information.
This in itself is easy enough, the tricky part is that I want to allow for a period of up to 4 weeks without 111 occuring, and still be counted as a part of the sequence of 111 occuring in a row. At the same time, if there are more than 4 consecutive weeks without 111 occuring, I want the number of consecutive occurences to be stored in a variable, if 111 starts occurring once more this information to be stored in another variable.
Unfortunately, I am unable to give a sample of my data, however I think this example illustrates my point:
Data have:
input id y_1601 y_1602 y_1603 y_1604 y_1605 y_1606 y_1607 y_1608 y_1609 y_1610 y_1611 y_1612;
datalines;
1 111 111 111 111 . . . . . . . .
2 23 . 111 111 111 . . . . . 111 111
3 52 111 111 111 111 87 87 111 111 111 111 111;
Data want:
input id y_1601 y_1602 y_1603 y_1604 y_1605 y_1606 y_1607 y_1608 y_1609 y_1610 y_1611 y_1612 seq1 seq2;
datalines;
1 111 111 111 111 . . . . . . . . 4 .
2 23 . 111 111 111 . . . . . 111 111 3 2
3 52 111 111 111 111 87 87 111 111 111 111 111 11 .;
Any help is greatly appreciated!
I'm not sure what you are suggesting, might be my limited knowledge of SAS keeping me back.
I am however working on a different idea, and would like to hear your take on it. I don't mind altering the dataset, so my basic idea is to get rid off the gaps by filling out the gaps with 111, and then afterwards doing some simple enumeration. I am however having some trouble with my code that I hope you can help me out with, and am quite sure it's because of my limited experience with arrays in SAS. The code I've tried is the following:
data want;
set have;
ARRAY gaps y_1601--y_1839;
do i=1 to 138;
if gaps[i]=111 and gaps[i+1]^=111 and (gaps[i+2]=111 or gaps[i+3]=111 or gaps[i+4]=111 or gaps[i+5]=111) then do;
if gaps[i+2]=111 then gaps[i+1]=111;
else if gaps[i+3]=111 then gaps[i+1]=111 and gaps[i+2]=111;
else if gaps[i+4]=111 then gaps[i+1]=111 and gaps[i+2]=111 and gaps[i+3]=111;
else if gaps[i+5]=111 then gaps[i+1]=111 and gaps[i+2]=111 and gaps[i+3]=111 and gaps[i+4]=111;
end;
end;
run;
I am however quite sure I am doing something wrong in terms of the if statements and the use of the logical expression. I hope you can help me!
try this and you might find your data easier to work with:
data foo;
input id y_1601 y_1602 y_1603 y_1604 y_1605 y_1606 y_1607 y_1608 y_1609 y_1610 y_1611 y_1612;
datalines;
1 111 111 111 111 . . . . . . . .
2 23 . 111 111 111 . . . . . 111 111
3 52 111 111 111 111 87 87 111 111 111 111 111
run;
PROC SQL;
CREATE VIEW WORK.SORTTempTableSorted AS
SELECT T.y_1601, T.y_1602, T.y_1603, T.y_1604, T.y_1605, T.y_1606, T.y_1607, T.y_1608, T.y_1609, T.y_1610, T.y_1611, T.y_1612
FROM WORK.FOO as T
;
QUIT;
PROC TRANSPOSE DATA=WORK.SORTTempTableSorted
OUT=WORK.TRNSTransposed(LABEL="Transposed WORK.FOO")
PREFIX=Column
NAME=Source
LABEL=Label
;
VAR y_1601 y_1602 y_1603 y_1604 y_1605 y_1606 y_1607 y_1608 y_1609 y_1610 y_1611 y_1612;
RUN; QUIT;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.