Hi,
I have a dataset that needs to be modified to show trends over time.
I think I need to transpose but I'm unsure on the best way to code it.
Have:
ID | Date | Date1 | FB | B |
1 | 30/09/2018 | 30/06/2019 | 2.34 | 2.47 |
1 | 30/09/2018 | 31/03/2019 | 2.34 | 2.48 |
1 | 30/09/2018 | 31/12/2018 | 2.34 | 2.49 |
1 | 30/09/2018 | 30/09/2018 | 2.34 | 2.50 |
2 | 31/03/2019 | 30/06/2019 | 2.82 | 2.74 |
2 | 31/03/2019 | 31/03/2019 | 2.82 | 2.75 |
3 | 30/06/2018 | 31/03/2019 | 3.14 | 2.69 |
3 | 30/06/2018 | 30/06/2018 | 3.14 | 2.70 |
3 | 30/06/2018 | 30/06/2019 | 3.14 | 2.71 |
3 | 30/06/2018 | 31/12/2018 | 3.14 | 2.72 |
3 | 30/06/2018 | 30/09/2018 | 3.14 | 2.73 |
Need:
ID | q0 | q1 | q2 | q3 | q4 | q5 |
1 | 2.34 | 2.48 | 2.49 | 2.5 | ||
2 | 2.82 | 2.75 | ||||
3 | 3.14 | 2.7 | 2.71 | 2.72 | 2.73 |
q0 always takes the first row of the FB field
q1-qx always takes the second row of the B field onwards (transposed)
Can you please suggest ways to code this?
Thanks in advance
data pretrans;
set have;
by id;
if first.id
then do;
b = fb;
count = 1;
end;
else count + 1;
_name_ = cats("q",count);
keep id _name_ b;
run;
proc transpose
data=pretrans
out=want (drop=_name_)
;
by id;
var b;
id _name_;
run;
For tested code, please supply data in usable form (data step with datalines);
You can use:
proc transpose data=have out=want;
var fb b;
by id;
run;
to create below dataset.
Next you can split this into two datasets FB (keep=ID COL1) and B (keep=ID COL2 COL3 COL4 COL5), based on _NAME_, and merge them and rename variables to Q0, Q1 etc.
data pretrans;
set have;
by id;
if first.id
then do;
b = fb;
count = 1;
end;
else count + 1;
_name_ = cats("q",count);
keep id _name_ b;
run;
proc transpose
data=pretrans
out=want (drop=_name_)
;
by id;
var b;
id _name_;
run;
For tested code, please supply data in usable form (data step with datalines);
There is no need to RETAIN count, as it is implicitly retained by using the Sum Statement .
PS good catch on that zero as a starting value.
That's a bit quirky to have Q1 come from a different variable than Q2 - Q5, but here you go, results, with code below the results.
Jim
Code:
DATA Have;
INFILE DATALINES DSD DLM='09'x;
INPUT ID
_Date : ANYDTDTE10.
_Date1 : ANYDTDTE10.
_FB
_B
;
FORMAT _Date DDMMYY10.;
FORMAT _Date1 DDMMYY10.;
DATALINES;
1 30/09/2018 30/06/2019 2.34 2.47
1 30/09/2018 31/03/2019 2.34 2.48
1 30/09/2018 31/12/2018 2.34 2.49
1 30/09/2018 30/09/2018 2.34 2.50
2 31/03/2019 30/06/2019 2.82 2.74
2 31/03/2019 31/03/2019 2.82 2.75
3 30/06/2018 31/03/2019 3.14 2.69
3 30/06/2018 30/06/2018 3.14 2.70
3 30/06/2018 30/06/2019 3.14 2.71
3 30/06/2018 31/12/2018 3.14 2.72
3 30/06/2018 30/09/2018 3.14 2.73
;
Run;
PROC SQL NOPRINT;
SELECT LEFT(PUT(MAX(ID_Count), 8.))
INTO : Max
FROM (
SELECT COUNT(1) AS ID_Count
FROM Have
GROUP BY ID
);
RUN;
%PUT NOTE: &=Max;
DATA Need;
DROP _:;
SET Have;
BY ID;
RETAIN _i;
RETAIN Q1 - Q&Max;
ARRAY QQ [*] Q1 - Q&Max;
IF FIRST.ID THEN
DO;
Q1 = _FB;
_i = 1;
END;
ELSE
DO;
_i + 1;
QQ[_i] = _B;
END;
IF LAST.ID THEN
DO;
OUTPUT;
CALL MISSING(of QQ[*]);
END;
RUN;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.