BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yegen
Pyrite | Level 9

I am trying to replace the missing industry codes for a given company with its most closest industry code (by date). I was able to write a code that replaces the missing value if there all industry codes for the given company are identical. However, in my data, occasionally the industry code changes, but I am not sure how I can replace the nearest industry code with the ones that are missing. With the following data, my code works well:

DATA work.sample;
	INPUT ID Year Industry;
	DATALINES; 
	004605 1972 . 
	004605 1973 .
	004605 1974 6798
	004610 1984 6799
	004610 1985 .
;
RUN;


data work.non_missing;
	set work.sample;
	if industry^=.;
	industry_correct=industry;
run;

proc sql;
	create table work.adding_industry as 
	select * 
	from work.sample as a 
	left join work.non_missing as b 
	on a.id=b.id;
quit;


However, with the following data (with non-identical industry codes for a given company) my code fails:

DATA work.sample;
	INPUT ID Year Industry;
	DATALINES; 
	004605 1972 . 
	004605 1973 .
	004605 1974 6798
	004605 1979 7788
	004610 1984 6799
	004610 1985 .
;
RUN;

Here is what the correct code will generate:

DATA work.want;
	INPUT ID Year Industry;
	DATALINES; 
	004605 1972 6798 
	004605 1973 6798
	004605 1974 6798
	004605 1979 7788
	004610 1984 6799
	004610 1985 6799
;
RUN;


I thought of the following, but I am wondering if there is a more robust way of doing this? 

proc sql;
	create table work.adding_industry as 
	select * 
	from work.sample as a 
	left join work.non_missing as b 
	on a.id=b.id and a.year=<b.year;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Try next code:

 

proc sort data=have; by id year; run;

data temp;
 set have;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;

proc sort data=temp; by id descending year; run;

data want;
 set temp;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;
             

View solution in original post

6 REPLIES 6
Shmuel
Garnet | Level 18

Try next code:

 

proc sort data=have; by id year; run;

data temp;
 set have;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;

proc sort data=temp; by id descending year; run;

data want;
 set temp;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;
             
Yegen
Pyrite | Level 9

Thanks, @Shmuel. The code works very well. The retain command is very powerful, I have to make use of it more often! 

The following part (let's call it part 1) of your code replaces missing observations with non-missing historical values:

data temp;
 set have;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;

The last part of your code (let's call it part 2) replaces the the missing observations with non-missing future values:

proc sort data=temp; by id descending year; run;

data want;
 set temp;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;


Suppose I have the following data:

DATA work.sample;
	INPUT ID Year Industry;
	DATALINES; 
	004610 1983 .
	004610 1984 6799
	004610 1985 9999
	004610 1986 .
	004610 1987 .
	004610 1988 8888
	004610 1989 .
	004610 1990 9999
;
RUN;

Whenever there is a future non-missing industry code and the code replaces missing industry code with that particular future non-missing idustry code, then we have the following output:

DATA work.want1;
	INPUT ID Year Industry;
	DATALINES; 
	004610 1983 6799
	004610 1984 6799
	004610 1985 9999
	004610 1986 8888
	004610 1987 8888
	004610 1988 8888
	004610 1989 9999
	004610 1990 9999
;


However, with your code the year=1989 is replaced with 8888 rather than 9999. Would you suggest just to remove the order of the code?  I.e., replace part 1 with part 2?

 

Update: @Shmuel, I just tried the reverse order (part 2, then part 1), but the output seems incorrect. 

Shmuel
Garnet | Level 18

You can try change sort types:

- 1st to descending

- 2nd drop the ascending to be ascening

 

I hope that may help. I haven't tried it.

 

Otherwise: maybe next code should help , to be entered after ascending sort


if last.id and industry=. then industry=9999;
Yegen
Pyrite | Level 9

This is very helpful, @Shmuel. I will give it a try. Thanks again for all of your valuable  suggestions.

TarunKumar
Pyrite | Level 9
DATA work.sample; INPUT ID Year Industry; DATALINES; 004605 1972 . 004605 1973 . 004605 1974 6798 004610 1984 6799 004610 1985 . ; RUN; PROC SORT DATA=SAMPLE ;BY ID Industry ;RUN; DATA SAMPLE_1; SET SAMPLE; BY ID ; IF LAST.ID THEN OUTPUT; RUN; PROC SQL; CREATE TABLE WANT AS SELECT A.*,B.Industry AS Industry_NEW FROM SAMPLE AS A LEFT JOIN SAMPLE_1 AS B ON A.ID = B.ID; QUIT;
Yegen
Pyrite | Level 9

 

Thanks, @TarunKumar. I have tried your code with the following dataset, but the output seems incorrect:

DATA work.sample;
	INPUT ID Year Industry;
	DATALINES; 
	004610 1983 .
	004610 1984 6799
	004610 1985 9999
	004610 1986 .
	004610 1987 .
	004610 1988 8888
	004610 1989 .
	004610 1990 9999
;
RUN;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 4934 views
  • 3 likes
  • 3 in conversation