I am taking a long dataset (repeated measures) some have 2 timepoints of data - some do not. When I use the following code to transpose the variables of interest to a wide dataset something his happening that is not transcribing values within a record. Whenever the observation has a record for time1 and time 2 - the first (time1) is entered as missing when there is a value in existence.
dataset I have
timestamp client age gender PCLscore CDRISCscore
4/08/22 xx 49 female 57 15
4/14/22 xx 49 female 52 17
5/12/22 xy 47 female 51 31
5/02/22 zz 46 male 51 27
5/13/22 zz 46 male 46 24
what I get;
client age gender timestamp1 timestamp2 PCLscore1 PCLscore2 CDRISCscore1 CDRISCscore3
xx 49 female . 4/14/22 . 52 . 17
xy 47 female 5/12/ 22 . 51 . 31 .
zz 46 male . 5/13/22 . 46 . 24
current code;
DATA eHome12_Match_&V ;
SET eHome11_Match_&V ;
BY client ;
KEEP client age gender timestamp1-timestamp2 PCL_Rscore1-PCL_Rscore2 CDRISC_Rscore1-CDRISC_Rscore2
PCS1-PCS2 MCS1-MCS2;
RETAIN client age gender ;
ARRAY adate(1:2) timestamp1-timestamp2;
ARRAY apcs(1:2) PCS1-PCS2;
ARRAY amcs(1:2) MCS1-MCS2;
ARRAY acdrisc(1:2) CDRISC_Rscore1-CDRISC_Rscore2;
ARRAY apcl(1:2) PCL_Rscore1-PCL_Rscore2;
IF first.client THEN
DO;
DO i = 1 to 2 ;
adate( i ) = . ;
acdrisc( i ) = . ;
apcs( i ) = .;
amcs( i ) = . ;
apcl( i ) = . ;
END;
END;
adate( time ) = timestamp ;
acdrisc( time ) = CDRISC_RScore ;
aPCL( time ) = PCL_RScore ;
aPCS( time ) = PCS ;
aMCS( time ) = MCS ;
IF last.client THEN OUTPUT ;
format timestamp1-timestamp2 mmddyy10.;
RUN;
@lberghammer wrote:
Whenever the observation has a record for time1 and time 2 - the first (time1) is entered as missing when there is a value in existence.
That's because you successfully populate the first element of the arrays of interest when you encounter the first record. But, because you do not retain those values, they are reset to missing whenever a second record is encountered, i.e. before data for that client is output.
Also I don't believe the code you present produced the output you show. That's because you use the TIME variable as an index of the arrays. Yet there is no time variable generated in your code, or shown in your input dataset.
Here is some simpler code (untested in the absence of sample data in a working data step):
DATA eHome12_Match_&V ;
do i=1 by 1 until (last.client);
SET eHome11_Match_&V ;
by client;
ARRAY adate(1:2) timestamp1-timestamp2;
ARRAY apcs(1:2) PCS1-PCS2;
ARRAY amcs(1:2) MCS1-MCS2;
ARRAY acdrisc(1:2) CDRISC_Rscore1-CDRISC_Rscore2;
ARRAY apcl(1:2) PCL_Rscore1-PCL_Rscore2;
adate{i}=timestamp;
acdrisc(i) = CDRISC_RScore ;
aPCL(i) = PCL_RScore ;
aPCS(i) = PCS ;
aMCS(i) = MCS ;
end;
KEEP client age gender timestamp1--pcl_rscore2 ;
run;
The main idea here is that all obs for a given client are read in during a single iteration of the data step (i.e. the SET statement is inside a do ... until group). So no retains are needed to overcome the problem I mentioned above.
BUT ... this program assumes you will never have more than 2 obs per client.
BUT ... this program assumes you will never have more than 2 obs per client.
It will just output multiple observations for those CLIENT values. So if there are 3 or 4 observations it will write 2 observations. If there are 5 or 6 then it will write 3 observations, etc.
@Tom quoted my comment
BUT ... this program assumes you will never have more than 2 obs per client.
and replied
It will just output multiple observations for those CLIENT values. So if there are 3 or 4 observations it will write 2 observations. If there are 5 or 6 then it will write 3 observations, etc.
If there are any clients with more than two obs this code will not write the last two observations for that client. It won't write any. When it encounters such a client, it will stop with an error message
ERROR: Array subscript out of range at line xxx column yyy.
The resulting data set will be partial - it will contain the desired results for all clients preceding the first client with more than 2 obs.
@mkeintz wrote:
@Tom quoted my comment
BUT ... this program assumes you will never have more than 2 obs per client.
and replied
It will just output multiple observations for those CLIENT values. So if there are 3 or 4 observations it will write 2 observations. If there are 5 or 6 then it will write 3 observations, etc.
If there are any clients with more than two obs this code will not write the last two observations for that client. It won't write any. When it encounters such a client, it will stop with an error message
ERROR: Array subscript out of range at line xxx column yyy.
The resulting data set will be partial - it will contain the desired results for all clients preceding the first client with more than 2 obs.
Did not realize they did not code the DO loop properly.
If you don't want it to fail when there are too many observations set the upperbound.
do i=1 to dim(adate) until (last.client);
Or if you actually used non-standard upper and lower bounds use
do i=lbound(adate) to ubound(adate) until (last.client);
is there a modification to extend time points/observations beyond 2?
You need to Retain all of the variables ending in 1 so they have values for the second record.
What do you expect to do with this "wide" data set that can't be done with the existing data?
data have; input (timestamp client age gender PCLscore CDRISCscore) ( $); cards; 4/08/22 xx 49 female 57 15 4/14/22 xx 49 female 52 17 5/12/22 xy 47 female 51 31 5/02/22 zz 46 male 51 27 5/13/22 zz 46 male 46 24 ; data have; set have; by client; if first.client and not last.client then call missing(timestamp,PCLscore,CDRISCscore); run; proc sql noprint; select max(n) into :n from (select count(*) as n from have group by client,age,gender); quit; proc summary data=have nway; class client age gender; output out=want(drop=_:) idgroup(out[&n.] (timestamp PCLscore CDRISCscore)=); run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.