I have the data of ‘k’ (>1000)companies (C) and each company is having ‘N’ variables(=18) as a time series ( dates). But, the starting date i.e. first non_zero or non_null or non_blank value for each company is not same.
Date | C1var1 | C1var2 | …. | C1varN | C2var1 | C2var2 | ….. | C2varN | …. | CKvar1 | CKvar2 | … | CKvarN |
1/1/2018 |
|
|
|
| |||||||||
2/1/2018 |
| 1279.38 | 70 | …. | 70 |
| …. | ||||||
3/1/2018 | 1515.61 | 82.8 | ….. | 83.3 | 1320.59 | 67 |
| 72.5 |
|
| |||
4/1/2018 | 1537.17 | 81.25 |
| 85.25 | 1364.19 | 70 |
| 72.9 |
| 1221.88 | 68 |
| 68 |
5/1/2018 | 1464.34 | 81.4 |
| 81.85 | 1321.55 | 72 |
| 73 |
| 1240.57 | 63.1 |
| 66.9 |
6/1/2018 | 1449.96 | 77.55 |
| 78.7 | 1256.86 | 68.7 |
| 68.7 |
| 1254.94 | 64.55 |
| 66.9 |
7/1/2018 | 1338.32 | 76.9 |
| 76.9 | 1243.92 | 66 |
| 66.85 |
| 1232.9 | 67 |
| 69.85 |
I have to pickup the data of each company at its starting point and each variable should be in same column as follows
Company | Date | Var1 | Var2 |
| Vark |
C1 | 3/1/2018 | 1515.61 | 82.8 |
| 83.3 |
C2 | 2/1/2018 | 1279.38 | 70 |
| 70 |
… | … |
|
|
| … |
CK | 4/1/2018 | 1221.88 | 68 |
| 68 |
What would be the best codes e to get the data in desired format
Version 9.4
Welcome to the SAS Communities 🙂
If your data is representable of your actual problem, then you can do something like this
data have;
infile datalines dlm=',' dsd;
input Date:ddmmyy10. C1var1 C1var2 C1var3 C2var1 C2var2 C2var3;
format Date ddmmyy10.;
datalines;
1/1/2018, , , , , ,
2/1/2018, , , ,1279.38,70,70
3/1/2018,1515.61,82.8,83.3,1320.59,67,72.5
4/1/2018,1537.17,81.25,85.25,1364.19,70,72.9
;
proc transpose data=have out=havelong;
by Date;
run;
proc sql;
create table temp as
select Date,
substr(_NAME_, 1, 2) as Company,
substr(_NAME_, 3, length(_NAME_)) as var,
COL1 as value
from havelong
where COL1 ne .
group by Company
having date=min(date)
order by Company, var;
quit;
proc transpose data=temp out=want(drop=_NAME_);
by Company Date;
id var;
var value;
run;
I have the data of ‘k’ (>1000)companies (C) and each company is having ‘N’ variables(=18) as a time series ( dates). But, the starting date i.e. first non_zero or non_null or non_blank value for each company is not same.
Unable to understand your K. Will there be 18*K + 1 columns per observation?
The Array will be useful for this purpose.
You need to specify what is K(or C?) so that array can be suitably sized.
18 cells at a time can be scanned in the array to look for the first non-empty cell as the starting point and proceed up to 18th cell. output C besides those non-missing values.
Similarly next 18 cells and on.
Finally sort the output Data Set by C and Date(edited).
Welcome to the SAS Communities 🙂
If your data is representable of your actual problem, then you can do something like this
data have;
infile datalines dlm=',' dsd;
input Date:ddmmyy10. C1var1 C1var2 C1var3 C2var1 C2var2 C2var3;
format Date ddmmyy10.;
datalines;
1/1/2018, , , , , ,
2/1/2018, , , ,1279.38,70,70
3/1/2018,1515.61,82.8,83.3,1320.59,67,72.5
4/1/2018,1537.17,81.25,85.25,1364.19,70,72.9
;
proc transpose data=have out=havelong;
by Date;
run;
proc sql;
create table temp as
select Date,
substr(_NAME_, 1, 2) as Company,
substr(_NAME_, 3, length(_NAME_)) as var,
COL1 as value
from havelong
where COL1 ne .
group by Company
having date=min(date)
order by Company, var;
quit;
proc transpose data=temp out=want(drop=_NAME_);
by Company Date;
id var;
var value;
run;
This illustrates the use of Array. I have assumed 3 Companies each having a name of 2 bytes. Further 3 variables are assumed. Four observations are processed.
data have; input date c1v1 c1v2 c1v3 c2v1 c2v2 c2v3 c3v1 c3v2 c3v3; datalines; 1 . . . 21 22 23 31 32 33 2 11 12 13 . . . 31 32 33 3 11 12 13 21 22 23 31 32 33 4 11 12 13 21 22 23 31 32 33 ; run; data want; length company $2; set have; array k c1v1 -- c3v3; array v v1 - v3; do i = 1 to dim(k); if missing(k[i]) then continue; else do; j = mod(i,3); if j = 0 then j = 3; v[j] = k[i]; company = vname(k[i]); end; if j = 3 then output; end; keep company date v:; run;
Note Company has a length of 2 bytes. So in
company = vname(k[i]);
Company gets the first 2 bytes from each variable name.
It looks there is no need for Sorting by Company and Date.
In my previous post I used Array to give the output. I am not sure whether that is the right output. Now I give the Input Data set and the derived output obtained using Array. Do you want the first non-missing values for your variables (18 in your case) and ignore all other values in that Row? Check and say what output you need.
The Input Data Set: data have; input date c1v1 c1v2 c1v3 c2v1 c2v2 c2v3 c3v1 c3v2 c3v3; datalines; 1 . . . 21 22 23 31 32 33 2 11 12 13 . . . 31 32 33 3 11 12 13 21 22 23 31 32 33 4 11 12 13 21 22 23 31 32 33 ; run;
The Output Data set:
company date v1 v2 v3 c2 1 21 22 23 c3 1 31 32 33 c1 2 11 12 13 c3 2 31 32 33 c1 3 11 12 13 c2 3 21 22 23 c3 3 31 32 33 c1 4 11 12 13 c2 4 21 22 23 c3 4 31 32 33
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.