Solved: Re: how to repeat information from previous line

Satori · Posted 12-11-2023 11:12 AM

This is the data I have:

data have;
input cname $ id $ assets rev sales emp code1 $ code2 $;
datalines;
Company1 CN9360002629 4196808624000 2606362179000 2606362179000 908500 4911 .
. . . . . . 4931 .
Company2 CN9375780665 101027000 1477554657000 1477554657000 80 7372 .
Company3 AU092210127 . . . . 5172 4953
Company4 BE0840549639 130972000000 64776000000 41034000000 34 1521 1521
. . . . . . 1522 1522
. . . . . . 1531 1531
. . . . . . 1541 1541
. . . . . . 6712 6500
. . . . . . 6719 7389
. . . . . . 6722 8740
;

The lines that do not have the variables cname, id have data that belong to the previous cname and id. I would like to add the data that is in the previous line.

What I want:

data want;
input cname $ id $ assets rev sales emp code1 $ code2 $;
datalines;
Company1 CN9360002629 4196808624000 2606362179000 2606362179000 908500 4911 .
Company1 CN9360002629 4196808624000 2606362179000 2606362179000 908500 4931 .
Company2 CN9375780665 101027000 1477554657000 1477554657000 80 7372 .
Company3 AU092210127 . . . . 5172 4953
Company4 BE0840549639 130972000000 64776000000 41034000000 34 1521 1521
Company4 BE0840549639 130972000000 64776000000 41034000000 34 1522 1522
Company4 BE0840549639 130972000000 64776000000 41034000000 34 1531 1531
Company4 BE0840549639 130972000000 64776000000 41034000000 34 1541 1541
Company4 BE0840549639 130972000000 64776000000 41034000000 34 6712 6500
Company4 BE0840549639 130972000000 64776000000 41034000000 34 6719 7389
Company4 BE0840549639 130972000000 64776000000 41034000000 34 6722 8740
;

A_Kh · Posted 12-11-2023 07:27 PM

For given dataset and output (without explanation of requirements) it would be something like this:

data have;
input cname $ id $ assets rev sales emp code1 $ code2 $;
format _numeric_ 16.;
datalines;
Company1 CN9360002629 4196808624000 2606362179000 2606362179000 908500 4911 .
. . . . . . 4931 .
Company2 CN9375780665 101027000 1477554657000 1477554657000 80 7372 .
Company3 AU092210127 . . . . 5172 4953
Company4 BE0840549639 130972000000 64776000000 41034000000 34 1521 1521
. . . . . . 1522 1522
. . . . . . 1531 1531
. . . . . . 1541 1541
. . . . . . 6712 6500
. . . . . . 6719 7389
. . . . . . 6722 8740
;
proc print;run; 

data temp;
	set have;
	length _cname $15;
	retain _cname;
	if cname ne '' then _cname=cname;
	if cname eq '' then cname=_cname;
	drop _cname; 
run; 

data want;
	set temp;
	by cname;
	length _id _code1 _code2 $15 _assets _rev _sales _emp 8;
	retain  _id _code1 _code2 _assets _rev _sales _emp; 
	if not last.cname then do;
		if id ne '' then _id=id;
		if code1 ne '' then _code1=code1;
		if code2 ne '' then _code2=code2;
		if assets ne . then _assets=assets;
		if rev ne . then _rev=rev;
		if sales ne . then _sales=sales;
		if emp ne . then _emp=emp;
	end; 
	if not first.cname then do;	
		if id eq '' then id= _id;
		if code1 eq '' then code1=_code1;
		if code2 eq '' then code2=_code2;
		if assets eq . then assets=_assets;
		if rev eq . then rev=_rev;
		if sales eq . then sales=_sales;
		if emp eq . then emp=_emp;
	end; 
	drop _:;
proc print; run;

Result;

First, CNAME is being carried forward for missing observations . Then data is being processed in BY-Groups (by CNAME) and missing observations populated using RETAIN statement. If there are more variables ARRAY would be a more efficient way in terms of coding.

View solution in original post

Tom · Posted 12-11-2023 11:39 AM

RETAIN the variables. Check the line before reading the variables.

data have;
  input dummy $ @1 @;
  if dummy ne ' ' then input cname $ id $ assets rev sales emp code1 $ code2 $;
  else input (6*dummy code1 code2) (:$);
  drop dummy ;
  retain cname -- emp ;
datalines;
Company1 CN9360002629 4196808624000 2606362179000 2606362179000 908500 4911 .
. . . . . . 4931 .
Company2 CN9375780665 101027000 1477554657000 1477554657000 80 7372 .
Company3 AU092210127 . . . . 5172 4953
Company4 BE0840549639 130972000000 64776000000 41034000000 34 1521 1521
. . . . . . 1522 1522
. . . . . . 1531 1531
. . . . . . 1541 1541
. . . . . . 6712 6500
. . . . . . 6719 7389
. . . . . . 6722 8740
;

If you unfortunately already have the messed up data as a dataset you could treat it as a last observations carried forward problem. But you really need a grouping variable for that.

data groups;
  retain group;
  set have;
  group + (cname ne ' ') ;
run;

data want;
  update groups(obs=0) groups;
  by group;
  output;
run;

Result

Obs    group     cname         id             assets            rev          sales       emp    code1    code2

  1      1      Company1    CN936000    4.1968086E12    2.606362E12    2.606362E12    908500    4911
  2      1      Company1    CN936000    4.1968086E12    2.606362E12    2.606362E12    908500    4931
  3      2      Company2    CN937578       101027000    1.477555E12    1.477555E12        80    7372
  4      3      Company3    AU092210               .              .              .         .    5172     4953
  5      4      Company4    BE084054    130972000000    64776000000    41034000000        34    1521     1521
  6      4      Company4    BE084054    130972000000    64776000000    41034000000        34    1522     1522
  7      4      Company4    BE084054    130972000000    64776000000    41034000000        34    1531     1531
  8      4      Company4    BE084054    130972000000    64776000000    41034000000        34    1541     1541
  9      4      Company4    BE084054    130972000000    64776000000    41034000000        34    6712     6500
 10      4      Company4    BE084054    130972000000    64776000000    41034000000        34    6719     7389
 11      4      Company4    BE084054    130972000000    64776000000    41034000000        34    6722     8740

Id you don't want the missing values of CODE1 and CODE2 filled in with the previous value then add this statement after the BY statement.

set groups(keep=code1 code2);

Satori · Posted 12-11-2023 01:20 PM

How can I do this for a dataset that already exists? Can I use this in a data step?

A_Kh · Posted 12-11-2023 07:27 PM

For given dataset and output (without explanation of requirements) it would be something like this:

data have;
input cname $ id $ assets rev sales emp code1 $ code2 $;
format _numeric_ 16.;
datalines;
Company1 CN9360002629 4196808624000 2606362179000 2606362179000 908500 4911 .
. . . . . . 4931 .
Company2 CN9375780665 101027000 1477554657000 1477554657000 80 7372 .
Company3 AU092210127 . . . . 5172 4953
Company4 BE0840549639 130972000000 64776000000 41034000000 34 1521 1521
. . . . . . 1522 1522
. . . . . . 1531 1531
. . . . . . 1541 1541
. . . . . . 6712 6500
. . . . . . 6719 7389
. . . . . . 6722 8740
;
proc print;run; 

data temp;
	set have;
	length _cname $15;
	retain _cname;
	if cname ne '' then _cname=cname;
	if cname eq '' then cname=_cname;
	drop _cname; 
run; 

data want;
	set temp;
	by cname;
	length _id _code1 _code2 $15 _assets _rev _sales _emp 8;
	retain  _id _code1 _code2 _assets _rev _sales _emp; 
	if not last.cname then do;
		if id ne '' then _id=id;
		if code1 ne '' then _code1=code1;
		if code2 ne '' then _code2=code2;
		if assets ne . then _assets=assets;
		if rev ne . then _rev=rev;
		if sales ne . then _sales=sales;
		if emp ne . then _emp=emp;
	end; 
	if not first.cname then do;	
		if id eq '' then id= _id;
		if code1 eq '' then code1=_code1;
		if code2 eq '' then code2=_code2;
		if assets eq . then assets=_assets;
		if rev eq . then rev=_rev;
		if sales eq . then sales=_sales;
		if emp eq . then emp=_emp;
	end; 
	drop _:;
proc print; run;

Result;

First, CNAME is being carried forward for missing observations . Then data is being processed in BY-Groups (by CNAME) and missing observations populated using RETAIN statement. If there are more variables ARRAY would be a more efficient way in terms of coding.

Tom · Posted 12-11-2023 08:05 PM

@Satori wrote:
How can I do this for a dataset that already exists? Can I use this in a data step?

See updated answer.

If you have the data as text, like your posted example, then you can use a data step that checks the line and only reads the values when they exists and remembers (RETAINs) them.

Otherwise I showed how you can use the UPDATE statement to handle last observation carried forward. But that requires first calculating a grouping variable.

Otherwise you will need to make a series of NEW variables that you can RETAIN to remember the value from the earlier observations. Here is a pattern. Fill in the other variables fallowing the pattern.

data want ;
  set have;
  if not missing(cname) then do;
     _cname=cname;
     _id = id;
    ...
  end;
  else do;
     cname=_cname;
     id=_id;
     ....
  end;
  retain _cname _id ...;
  drop _cname _id ...;
run;

how to repeat information from previous line

Re: how to repeat information from previous line

Re: how to repeat information from previous line

Re: how to repeat information from previous line

Re: how to repeat information from previous line

Re: how to repeat information from previous line

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!