BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yegen
Pyrite | Level 9

I am trying to replace the missing industry codes for a given company with its most closest industry code (by date). I was able to write a code that replaces the missing value if there all industry codes for the given company are identical. However, in my data, occasionally the industry code changes, but I am not sure how I can replace the nearest industry code with the ones that are missing. With the following data, my code works well:

DATA work.sample;
	INPUT ID Year Industry;
	DATALINES; 
	004605 1972 . 
	004605 1973 .
	004605 1974 6798
	004610 1984 6799
	004610 1985 .
;
RUN;


data work.non_missing;
	set work.sample;
	if industry^=.;
	industry_correct=industry;
run;

proc sql;
	create table work.adding_industry as 
	select * 
	from work.sample as a 
	left join work.non_missing as b 
	on a.id=b.id;
quit;


However, with the following data (with non-identical industry codes for a given company) my code fails:

DATA work.sample;
	INPUT ID Year Industry;
	DATALINES; 
	004605 1972 . 
	004605 1973 .
	004605 1974 6798
	004605 1979 7788
	004610 1984 6799
	004610 1985 .
;
RUN;

Here is what the correct code will generate:

DATA work.want;
	INPUT ID Year Industry;
	DATALINES; 
	004605 1972 6798 
	004605 1973 6798
	004605 1974 6798
	004605 1979 7788
	004610 1984 6799
	004610 1985 6799
;
RUN;


I thought of the following, but I am wondering if there is a more robust way of doing this? 

proc sql;
	create table work.adding_industry as 
	select * 
	from work.sample as a 
	left join work.non_missing as b 
	on a.id=b.id and a.year=<b.year;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

Try next code:

 

proc sort data=have; by id year; run;

data temp;
 set have;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;

proc sort data=temp; by id descending year; run;

data want;
 set temp;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;
             

View solution in original post

6 REPLIES 6
Shmuel
Garnet | Level 18

Try next code:

 

proc sort data=have; by id year; run;

data temp;
 set have;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;

proc sort data=temp; by id descending year; run;

data want;
 set temp;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;
             
Yegen
Pyrite | Level 9

Thanks, @Shmuel. The code works very well. The retain command is very powerful, I have to make use of it more often! 

The following part (let's call it part 1) of your code replaces missing observations with non-missing historical values:

data temp;
 set have;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;

The last part of your code (let's call it part 2) replaces the the missing observations with non-missing future values:

proc sort data=temp; by id descending year; run;

data want;
 set temp;
      by id ;
          retain ic;   /* industry code */
          if first.id then ic=industry;
          if industry=. then industry=ic;
          else ic=industry; drop ic;
run;


Suppose I have the following data:

DATA work.sample;
	INPUT ID Year Industry;
	DATALINES; 
	004610 1983 .
	004610 1984 6799
	004610 1985 9999
	004610 1986 .
	004610 1987 .
	004610 1988 8888
	004610 1989 .
	004610 1990 9999
;
RUN;

Whenever there is a future non-missing industry code and the code replaces missing industry code with that particular future non-missing idustry code, then we have the following output:

DATA work.want1;
	INPUT ID Year Industry;
	DATALINES; 
	004610 1983 6799
	004610 1984 6799
	004610 1985 9999
	004610 1986 8888
	004610 1987 8888
	004610 1988 8888
	004610 1989 9999
	004610 1990 9999
;


However, with your code the year=1989 is replaced with 8888 rather than 9999. Would you suggest just to remove the order of the code?  I.e., replace part 1 with part 2?

 

Update: @Shmuel, I just tried the reverse order (part 2, then part 1), but the output seems incorrect. 

Shmuel
Garnet | Level 18

You can try change sort types:

- 1st to descending

- 2nd drop the ascending to be ascening

 

I hope that may help. I haven't tried it.

 

Otherwise: maybe next code should help , to be entered after ascending sort


if last.id and industry=. then industry=9999;
Yegen
Pyrite | Level 9

This is very helpful, @Shmuel. I will give it a try. Thanks again for all of your valuable  suggestions.

TarunKumar
Pyrite | Level 9
DATA work.sample; INPUT ID Year Industry; DATALINES; 004605 1972 . 004605 1973 . 004605 1974 6798 004610 1984 6799 004610 1985 . ; RUN; PROC SORT DATA=SAMPLE ;BY ID Industry ;RUN; DATA SAMPLE_1; SET SAMPLE; BY ID ; IF LAST.ID THEN OUTPUT; RUN; PROC SQL; CREATE TABLE WANT AS SELECT A.*,B.Industry AS Industry_NEW FROM SAMPLE AS A LEFT JOIN SAMPLE_1 AS B ON A.ID = B.ID; QUIT;
Yegen
Pyrite | Level 9

 

Thanks, @TarunKumar. I have tried your code with the following dataset, but the output seems incorrect:

DATA work.sample;
	INPUT ID Year Industry;
	DATALINES; 
	004610 1983 .
	004610 1984 6799
	004610 1985 9999
	004610 1986 .
	004610 1987 .
	004610 1988 8888
	004610 1989 .
	004610 1990 9999
;
RUN;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 3333 views
  • 3 likes
  • 3 in conversation