- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is my dataset:
Client | Gender | Test_ID | Test_Date |
A | F | 123 | 5/1/2015 |
A | F | 567 | 11/1/2015 |
B | M | 789 | 1/1/2017 |
B | M | 548 | 6/1/2017 |
C | M | 121 | 8/1/2016 |
C | M | 540 | 2/1/2017 |
I want this dataset:
Client | Gender | Test_ID_one | Test_ID_two | Test_Date_one | Test_date_two |
A | F | 123 | 567 | 5/1/2015 | 11/1/2015 |
B | M | 789 | 548 | 1/1/2017 | 6/1/2017 |
C | M | 121 | 540 | 8/1/2016 | 2/1/2017 |
I think this is a transpose? Any help would be great!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Does each client level always only contain two observations?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
in my current dataset, yes - there are only 2 observations per client. But, I may expand that to multiple obs per client (not exceeding 25 obs)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, PROC TRANSPOSE is the tool. I did a simple google search
transpose site:sas.com
and the first hit had an example of what you are looking for
https://support.sas.com/resources/papers/proceedings09/060-2009.pdf
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
May require a double transpose and the transpose will likely convert the dates to character
If you are OK with this, I can try
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Client $ Gender $ test_id $ Test_Date :mmddyy10.;
format test_date mmddyy10.;
cards;
A F 123 5/1/2015
A F 567 11/1/2015
B M 789 1/1/2017
B M 548 6/1/2017
C M 121 8/1/2016
C M 540 2/1/2017
;
data want;
do _n_=1 by 1 until(last.client);
set have;
by Client ;
array j(*)$ test_id1-test_id2 ;
array k(*) Test_Date1-Test_Date2;
if _n_=1 then call missing(of j(*),of k(*));
j(_n_)=test_id;
k(_n_)=Test_Date;
end;
format test_Date: mmddyy10.;
drop test_id test_date;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So lets look at what @kt_080517 has where they may have multipole test for a client how would they make test_id2 and test_date2 dynamic so they would not have to edit the program and replace the test_id2 with test_id3?
data have;
input Client $ Gender $ test_id $ Test_Date :mmddyy10.;
format test_date mmddyy10.;
cards;
A F 123 5/1/2015
A F 124 5/4/2015
A F 567 11/1/2015
B M 789 1/1/2017
B M 790 2/21/2017
B M 548 6/1/2017
C M 121 8/1/2016
C M 540 2/1/2017
;
data want;
do _n_=1 by 1 until(last.client);
set have;
by Client ;
array j(*)$ test_id1-test_id3 ;
array k(*) Test_Date1-Test_Date3;
if _n_=1 then call missing(of j(*),of k(*));
j(_n_)=test_id;
k(_n_)=Test_Date;
end;
format test_Date: mmddyy10.;
drop test_id test_date;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, what do you mean by this So lets look at what kt_080517 has where??
I am not catching your suggestion
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can Proc SQL be used here? to bring a value from obs other than the first into a column of its own so all data is in one row?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
why sql?
Sql can help in getting the array subscript(number of elements) as a macro to make the code i shared dynamic. however, for transpose either data-step or proc transpose is best
So this should handle multiple id requirement:
data have;
input Client $ Gender $ test_id $ Test_Date :mmddyy10.;
format test_date mmddyy10.;
cards;
A F 123 5/1/2015
A F 567 11/1/2015
B M 789 1/1/2017
B M 548 6/1/2017
C M 121 8/1/2016
C M 540 2/1/2017
;
proc sql;
select max(cnt) into : m trimmed
from (select count(client) as cnt from have group by client);
quit;
data want;
do _n_=1 by 1 until(last.client);
set have(rename=(test_id=_test_id Test_Date=_Test_Date));
by Client ;
array test_id(&m)$ ;
array Test_Date(&m);
if _n_=1 then call missing(of test_id(*),of Test_Date(*));
test_id(_n_)=_test_id;
Test_Date(_n_)=_Test_Date;
end;
format test_Date: mmddyy10.;
drop _:;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That works well with a data step using BY statement (by client)
Collect test dates and test_ID in the easiest way, as an array.
Suppose there are no more than 10 tests for each client.
Data client_tests ;
Do test_n = 1 by 1 until( last.client) ;
Set your_dataset end= finished ;
By client;
Array testDate(10) ;
Array testID(10) $10 ;
IF TEST_N > dim( testdate) then do ;
Err_ct +1 ;
If err_ct =1 then
Put "Error: need to widen the array" / _all_ ;
Continue ;
testDate( test_n ) = test_date ;
TestID( test_n ) = test_id ;
End ;
If finished and err_ct >1 then
Put 'WARNING: found ' err_ct ' inputs outside array.' ;
Run;
You might not need as many as 10.
You could collect the max number of tests for a client within the data step.
As the test dates and IDs are stored in array variables testDate1 to testDate10 and testID1 to testID10 you could drop the vars test_date and test_id, along with ER_CT.
On each output row the variable test_n indicates 1 more than the number of tests for each client.
I have assumed that taking the sex value from the last client row provides correct results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@kt_080517 wrote:
This is my dataset:
Client Gender Test_ID Test_Date A F 123 5/1/2015 A F 567 11/1/2015 B M 789 1/1/2017 B M 548 6/1/2017 C M 121 8/1/2016 C M 540 2/1/2017
I want this dataset:
Client Gender Test_ID_one Test_ID_two Test_Date_one Test_date_two A F 123 567 5/1/2015 11/1/2015 B M 789 548 1/1/2017 6/1/2017 C M 121 540 8/1/2016 2/1/2017
I think this is a transpose? Any help would be great!
Another of these combine rows question and I just spent yet another couple of hours or so undoing data provided in that format so it can actually be used...
Repeat after me: I hate spreadsheets.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input Client $ Gender $ test_id $ Test_Date :mmddyy10.;
format test_date mmddyy10.;
cards;
A F 123 5/1/2015
A F 567 11/1/2015
B M 789 1/1/2017
B M 548 6/1/2017
C M 121 8/1/2016
C M 540 2/1/2017
;
proc summary data=have nway;
class Client ;
output out=want idgroup(out[2] ( Gender test_id Test_Date)=);
run;