I have the data of ‘k’ (>1000)companies (C) and each company is having ‘N’ variables(=18) as a time series ( dates). But, the starting date i.e. first non_zero or non_null or non_blank value for each company is not same.
Date | C1var1 | C1var2 | …. | C1varN | C2var1 | C2var2 | ….. | C2varN | …. | CKvar1 | CKvar2 | … | CKvarN |
1/1/2018 |
|
|
|
| |||||||||
2/1/2018 |
| 1279.38 | 70 | …. | 70 |
| …. | ||||||
3/1/2018 | 1515.61 | 82.8 | ….. | 83.3 | 1320.59 | 67 |
| 72.5 |
|
| |||
4/1/2018 | 1537.17 | 81.25 |
| 85.25 | 1364.19 | 70 |
| 72.9 |
| 1221.88 | 68 |
| 68 |
5/1/2018 | 1464.34 | 81.4 |
| 81.85 | 1321.55 | 72 |
| 73 |
| 1240.57 | 63.1 |
| 66.9 |
6/1/2018 | 1449.96 | 77.55 |
| 78.7 | 1256.86 | 68.7 |
| 68.7 |
| 1254.94 | 64.55 |
| 66.9 |
7/1/2018 | 1338.32 | 76.9 |
| 76.9 | 1243.92 | 66 |
| 66.85 |
| 1232.9 | 67 |
| 69.85 |
I have to pickup the data of each company at its starting point and each variable should be in same column as follows
Company | Date | Var1 | Var2 |
| Vark |
C1 | 3/1/2018 | 1515.61 | 82.8 |
| 83.3 |
C2 | 2/1/2018 | 1279.38 | 70 |
| 70 |
… | … |
|
|
| … |
CK | 4/1/2018 | 1221.88 | 68 |
| 68 |
What would be the best codes e to get the data in desired format
Version 9.4
Welcome to the SAS Communities 🙂
If your data is representable of your actual problem, then you can do something like this
data have;
infile datalines dlm=',' dsd;
input Date:ddmmyy10. C1var1 C1var2 C1var3 C2var1 C2var2 C2var3;
format Date ddmmyy10.;
datalines;
1/1/2018, , , , , ,
2/1/2018, , , ,1279.38,70,70
3/1/2018,1515.61,82.8,83.3,1320.59,67,72.5
4/1/2018,1537.17,81.25,85.25,1364.19,70,72.9
;
proc transpose data=have out=havelong;
by Date;
run;
proc sql;
create table temp as
select Date,
substr(_NAME_, 1, 2) as Company,
substr(_NAME_, 3, length(_NAME_)) as var,
COL1 as value
from havelong
where COL1 ne .
group by Company
having date=min(date)
order by Company, var;
quit;
proc transpose data=temp out=want(drop=_NAME_);
by Company Date;
id var;
var value;
run;
I have the data of ‘k’ (>1000)companies (C) and each company is having ‘N’ variables(=18) as a time series ( dates). But, the starting date i.e. first non_zero or non_null or non_blank value for each company is not same.
Unable to understand your K. Will there be 18*K + 1 columns per observation?
The Array will be useful for this purpose.
You need to specify what is K(or C?) so that array can be suitably sized.
18 cells at a time can be scanned in the array to look for the first non-empty cell as the starting point and proceed up to 18th cell. output C besides those non-missing values.
Similarly next 18 cells and on.
Finally sort the output Data Set by C and Date(edited).
Welcome to the SAS Communities 🙂
If your data is representable of your actual problem, then you can do something like this
data have;
infile datalines dlm=',' dsd;
input Date:ddmmyy10. C1var1 C1var2 C1var3 C2var1 C2var2 C2var3;
format Date ddmmyy10.;
datalines;
1/1/2018, , , , , ,
2/1/2018, , , ,1279.38,70,70
3/1/2018,1515.61,82.8,83.3,1320.59,67,72.5
4/1/2018,1537.17,81.25,85.25,1364.19,70,72.9
;
proc transpose data=have out=havelong;
by Date;
run;
proc sql;
create table temp as
select Date,
substr(_NAME_, 1, 2) as Company,
substr(_NAME_, 3, length(_NAME_)) as var,
COL1 as value
from havelong
where COL1 ne .
group by Company
having date=min(date)
order by Company, var;
quit;
proc transpose data=temp out=want(drop=_NAME_);
by Company Date;
id var;
var value;
run;
This illustrates the use of Array. I have assumed 3 Companies each having a name of 2 bytes. Further 3 variables are assumed. Four observations are processed.
data have; input date c1v1 c1v2 c1v3 c2v1 c2v2 c2v3 c3v1 c3v2 c3v3; datalines; 1 . . . 21 22 23 31 32 33 2 11 12 13 . . . 31 32 33 3 11 12 13 21 22 23 31 32 33 4 11 12 13 21 22 23 31 32 33 ; run; data want; length company $2; set have; array k c1v1 -- c3v3; array v v1 - v3; do i = 1 to dim(k); if missing(k[i]) then continue; else do; j = mod(i,3); if j = 0 then j = 3; v[j] = k[i]; company = vname(k[i]); end; if j = 3 then output; end; keep company date v:; run;
Note Company has a length of 2 bytes. So in
company = vname(k[i]);
Company gets the first 2 bytes from each variable name.
It looks there is no need for Sorting by Company and Date.
In my previous post I used Array to give the output. I am not sure whether that is the right output. Now I give the Input Data set and the derived output obtained using Array. Do you want the first non-missing values for your variables (18 in your case) and ignore all other values in that Row? Check and say what output you need.
The Input Data Set: data have; input date c1v1 c1v2 c1v3 c2v1 c2v2 c2v3 c3v1 c3v2 c3v3; datalines; 1 . . . 21 22 23 31 32 33 2 11 12 13 . . . 31 32 33 3 11 12 13 21 22 23 31 32 33 4 11 12 13 21 22 23 31 32 33 ; run;
The Output Data set:
company date v1 v2 v3 c2 1 21 22 23 c3 1 31 32 33 c1 2 11 12 13 c3 2 31 32 33 c1 3 11 12 13 c2 3 21 22 23 c3 3 31 32 33 c1 4 11 12 13 c2 4 21 22 23 c3 4 31 32 33
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.