BookmarkSubscribeRSS Feed
lberghammer
Calcite | Level 5

I am taking a long dataset (repeated measures) some have 2 timepoints of data - some do not. When I use the following code to transpose the variables of interest to a wide dataset something his happening that is not transcribing values within a record. Whenever the observation has a record for time1 and time 2 - the first (time1) is entered as missing when there is a value in existence.

 

dataset I have

timestamp    client    age    gender    PCLscore  CDRISCscore

4/08/22          xx        49     female         57                 15

4/14/22          xx        49     female         52                 17

5/12/22          xy        47     female         51                 31

5/02/22          zz        46     male            51                 27

5/13/22          zz        46     male            46                 24

 

what I get;

client   age   gender    timestamp1  timestamp2 PCLscore1  PCLscore2 CDRISCscore1    CDRISCscore3

xx        49     female              .          4/14/22               .                  52                     .                         17

xy        47     female        5/12/ 22          .                   51                 .                      31                         .

zz        46     male                .             5/13/22              .                 46                      .                        24

 

 

current code;


DATA eHome12_Match_&V ;
SET eHome11_Match_&V ;
BY client ;

KEEP client age gender timestamp1-timestamp2 PCL_Rscore1-PCL_Rscore2 CDRISC_Rscore1-CDRISC_Rscore2
PCS1-PCS2 MCS1-MCS2;
RETAIN client age gender ;

ARRAY adate(1:2) timestamp1-timestamp2;
ARRAY apcs(1:2) PCS1-PCS2;
ARRAY amcs(1:2) MCS1-MCS2;
ARRAY acdrisc(1:2) CDRISC_Rscore1-CDRISC_Rscore2;
ARRAY apcl(1:2) PCL_Rscore1-PCL_Rscore2;


IF first.client THEN
DO;
DO i = 1 to 2 ;
adate( i ) = . ;
acdrisc( i ) = . ;
apcs( i ) = .;
amcs( i ) = . ;
apcl( i ) = . ;
END;
END;

adate( time ) = timestamp ;
acdrisc( time ) = CDRISC_RScore ;
aPCL( time ) = PCL_RScore ;
aPCS( time ) = PCS ;
aMCS( time ) = MCS ;

IF last.client THEN OUTPUT ;

format timestamp1-timestamp2 mmddyy10.;
RUN;

7 REPLIES 7
mkeintz
PROC Star

@lberghammer wrote:

Whenever the observation has a record for time1 and time 2 - the first (time1) is entered as missing when there is a value in existence.

That's because you successfully populate the first element of the arrays of interest when you encounter the first record.  But, because you do not retain those values, they are reset to missing whenever a second record is encountered, i.e. before data for that client is output.

 

Also I don't believe the code you present produced the output you show.  That's because you use the TIME variable as an index of the arrays.  Yet there is no time variable generated in your code, or shown in your input dataset.

 

Here is some simpler code (untested in the absence of sample data in a working data step): 

DATA eHome12_Match_&V ;
  do i=1 by 1 until (last.client);
    SET eHome11_Match_&V ;
    by client;

    ARRAY adate(1:2) timestamp1-timestamp2;
    ARRAY apcs(1:2) PCS1-PCS2;
    ARRAY amcs(1:2) MCS1-MCS2;
    ARRAY acdrisc(1:2) CDRISC_Rscore1-CDRISC_Rscore2;
    ARRAY apcl(1:2) PCL_Rscore1-PCL_Rscore2;

    adate{i}=timestamp;
    acdrisc(i) = CDRISC_RScore ;
    aPCL(i) = PCL_RScore ;
    aPCS(i) = PCS ;
    aMCS(i) = MCS ;
  end;
  KEEP client age gender timestamp1--pcl_rscore2 ;
run;

The main idea here is that all obs for a given client are read in during a single iteration of the data step  (i.e. the SET statement is inside a do ... until group).   So no retains are needed to overcome the problem I mentioned above.

 

BUT ... this program assumes you will never have more than 2 obs per client.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

BUT ... this program assumes you will never have more than 2 obs per client.

It will just output multiple observations for those CLIENT values.  So if there are 3 or 4 observations it will write 2 observations. If there are 5 or 6 then it will write 3 observations, etc.

mkeintz
PROC Star

@Tom quoted my comment

BUT ... this program assumes you will never have more than 2 obs per client.

and replied

 

 

It will just output multiple observations for those CLIENT values.  So if there are 3 or 4 observations it will write 2 observations. If there are 5 or 6 then it will write 3 observations, etc.


If there are any clients with more than two obs this code will not write the last two observations for that client.  It won't write any.  When it encounters such a client, it will stop with an error message

ERROR: Array subscript out of range at line xxx column yyy.

The resulting data set will be partial - it will contain the desired results for all clients preceding the first client with more than 2 obs.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

@mkeintz wrote:

@Tom quoted my comment

BUT ... this program assumes you will never have more than 2 obs per client.

and replied

 

 

It will just output multiple observations for those CLIENT values.  So if there are 3 or 4 observations it will write 2 observations. If there are 5 or 6 then it will write 3 observations, etc.


If there are any clients with more than two obs this code will not write the last two observations for that client.  It won't write any.  When it encounters such a client, it will stop with an error message

ERROR: Array subscript out of range at line xxx column yyy.

The resulting data set will be partial - it will contain the desired results for all clients preceding the first client with more than 2 obs.

 

 


Did not realize they did not code the DO loop properly.

If you don't want it to fail when there are too many observations set the upperbound.

  do i=1 to dim(adate) until (last.client);

Or if you actually used non-standard upper and lower bounds use 

 do i=lbound(adate) to ubound(adate) until (last.client);
lberghammer
Calcite | Level 5

is there a modification to extend time points/observations beyond 2?

ballardw
Super User

You need to Retain all of the variables ending in 1 so they have values for the second record.

 

What do you expect to do with this "wide" data set that can't be done with the existing data?

 

 

Ksharp
Super User
data have;
input (timestamp    client    age    gender    PCLscore  CDRISCscore) ( $);
cards;
4/08/22          xx        49     female         57                 15
4/14/22          xx        49     female         52                 17
5/12/22          xy        47     female         51                 31
5/02/22          zz        46     male            51                 27
5/13/22          zz        46     male            46                 24
;

data have;
 set have;
 by client;
 if first.client and not last.client then call missing(timestamp,PCLscore,CDRISCscore);
run;

proc sql noprint;
select max(n) into :n from
(select count(*) as n from have group by client,age,gender);
quit;

proc summary data=have nway;
class client age gender;
output out=want(drop=_:) idgroup(out[&n.] (timestamp PCLscore CDRISCscore)=);
run;
 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1510 views
  • 0 likes
  • 5 in conversation