Solved: selecting the first ocuurance

KABONLINE · Posted 07-03-2019 07:07 AM

I have the data of ‘k’ (>1000)companies (C) and each company is having ‘N’ variables(=18) as a time series ( dates). But, the starting date i.e. first non_zero or non_null or non_blank value for each company is not same.

Date	C1var1	C1var2	….	C1varN	C2var1	C2var2	…..	C2varN	….	CKvar1	CKvar2	…	CKvarN
1/1/2018
2/1/2018					1279.38	70	….	70				….
3/1/2018	1515.61	82.8	…..	83.3	1320.59	67		72.5
4/1/2018	1537.17	81.25		85.25	1364.19	70		72.9		1221.88	68		68
5/1/2018	1464.34	81.4		81.85	1321.55	72		73		1240.57	63.1		66.9
6/1/2018	1449.96	77.55		78.7	1256.86	68.7		68.7		1254.94	64.55		66.9
7/1/2018	1338.32	76.9		76.9	1243.92	66		66.85		1232.9	67		69.85

I have to pickup the data of each company at its starting point and each variable should be in same column as follows

Company	Date	Var1	Var2	Vark
C1	3/1/2018	1515.61	82.8	83.3
C2	2/1/2018	1279.38	70	70
…	…			…
CK	4/1/2018	1221.88	68	68

What would be the best codes e to get the data in desired format

Version 9.4

PeterClemmensen · Posted 07-03-2019 07:44 AM

Welcome to the SAS Communities 🙂

If your data is representable of your actual problem, then you can do something like this

data have;
infile datalines dlm=',' dsd;
input Date:ddmmyy10. C1var1 C1var2 C1var3 C2var1 C2var2 C2var3;
format Date ddmmyy10.;
datalines;
1/1/2018, , , , , , 
2/1/2018, , , ,1279.38,70,70
3/1/2018,1515.61,82.8,83.3,1320.59,67,72.5
4/1/2018,1537.17,81.25,85.25,1364.19,70,72.9
;

proc transpose data=have out=havelong;
    by Date;
run;

proc sql;
    create table temp as
    select Date,
           substr(_NAME_, 1, 2) as Company,
           substr(_NAME_, 3, length(_NAME_)) as var,
           COL1 as value
    from havelong
    where COL1 ne .
    group by Company
    having date=min(date)
    order by Company, var;
quit;

proc transpose data=temp out=want(drop=_NAME_);
    by Company Date;
    id var;
    var value;
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

KachiM · Posted 07-03-2019 07:31 AM

I have the data of ‘k’ (>1000)companies (C) and each company is having   ‘N’ variables(=18)  as a time series ( dates). But, the starting date i.e. first non_zero or non_null or non_blank value for each company is not same.

Unable to understand your K. Will there be 18*K + 1 columns per observation?

The Array will be useful for this purpose.

You need to specify what is K(or C?) so that array can be suitably sized.

18 cells at a time can be scanned in the array to look for the first non-empty cell as the starting point and proceed up to 18th cell. output C besides those non-missing values.

Similarly next 18 cells and on.

Finally sort the output Data Set by C and Date(edited).

PeterClemmensen · Posted 07-03-2019 07:44 AM

Welcome to the SAS Communities 🙂

If your data is representable of your actual problem, then you can do something like this

data have;
infile datalines dlm=',' dsd;
input Date:ddmmyy10. C1var1 C1var2 C1var3 C2var1 C2var2 C2var3;
format Date ddmmyy10.;
datalines;
1/1/2018, , , , , , 
2/1/2018, , , ,1279.38,70,70
3/1/2018,1515.61,82.8,83.3,1320.59,67,72.5
4/1/2018,1537.17,81.25,85.25,1364.19,70,72.9
;

proc transpose data=have out=havelong;
    by Date;
run;

proc sql;
    create table temp as
    select Date,
           substr(_NAME_, 1, 2) as Company,
           substr(_NAME_, 3, length(_NAME_)) as var,
           COL1 as value
    from havelong
    where COL1 ne .
    group by Company
    having date=min(date)
    order by Company, var;
quit;

proc transpose data=temp out=want(drop=_NAME_);
    by Company Date;
    id var;
    var value;
run;

The DATA to DATA Step Macro
Blog: SASnrd

KachiM · Posted 07-03-2019 10:04 AM

This illustrates the use of Array. I have assumed 3 Companies each having a name of 2 bytes. Further 3 variables are assumed. Four observations are processed.

data have;
input date c1v1 c1v2 c1v3 c2v1 c2v2 c2v3 c3v1 c3v2 c3v3;
datalines;
1  .   .  .  21 22 23  31 32 33
2 11  12  13  .  .  .  31 32 33
3 11  12  13 21 22 23  31 32 33
4 11  12  13 21 22 23  31 32 33
;
run;


data want;
   length company $2;
   set have;
   array k c1v1 -- c3v3;
   array v v1 - v3;
   
   do i = 1 to dim(k);
      if missing(k[i]) then continue;
      else do;
         j = mod(i,3);
         if j = 0 then j = 3;
         v[j] = k[i];
         company = vname(k[i]);
      end;
      if j = 3 then output;
   end;
keep company date v:;
run;

Note Company has a length of 2 bytes. So in

company = vname(k[i]);

Company gets the first 2 bytes from each variable name.

It looks there is no need for Sorting by Company and Date.

KachiM · Posted 07-03-2019 12:48 PM

In my previous post I used Array to give the output. I am not sure whether that is the right output. Now I give the Input Data set and the derived output obtained using Array. Do you want the first non-missing values for your variables (18 in your case) and ignore all other values in that Row? Check and say what output you need.

The Input Data Set:

data have;
input date c1v1 c1v2 c1v3 c2v1 c2v2 c2v3 c3v1 c3v2 c3v3;
datalines;
1  .   .  .  21 22 23  31 32 33
2 11  12  13  .  .  .  31 32 33
3 11  12  13 21 22 23  31 32 33
4 11  12  13 21 22 23  31 32 33
;
run;

The Output Data set:

company 	date 	v1 	v2 	v3
c2 	1 	21 	22 	23
c3 	1 	31 	32 	33
c1 	2 	11 	12 	13
c3 	2 	31 	32 	33
c1 	3 	11 	12 	13
c2 	3 	21 	22 	23
c3 	3 	31 	32 	33
c1 	4 	11 	12 	13
c2 	4 	21 	22 	23
c3 	4 	31 	32 	33

selecting the first ocuurance

Re: selecting the first ocuurance

Re: selecting the first ocuurance

Re: selecting the first ocuurance

Re: selecting the first ocuurance

Re: selecting the first ocuurance

Catch up on SAS Innovate 2026

selecting the first ocuurance

Re: selecting the first ocuurance

Re: selecting the first ocuurance

Re: selecting the first ocuurance

Re: selecting the first ocuurance

Re: selecting the first ocuurance

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away