Is there a way that we can convert the below data steps into one proc SQL step?
data dis_entity_id ;
  set &library..&input_table ;
  format ENTITY_ID $ENTITY. REPORTING $50.;
  keep ENTITY_ID REPORTING;
run;
proc sort data=dis_entity_id nodupkey;
  by ENTITY_ID REPORTING;
run;
					
				
			
			
				
			
			
			
			
			
			
			
		I can't see your data so needless to say, this code is untested
proc sql;
    create table dis_entity_id as
    select distinct ENTITY_ID format=$ENTITY.,
           REPORTING format=$50.
    from &library..&input_table
    order by ENTITY_ID, REPORTING;
quit;
					
				
			
			
				
			
			
			
			
			
			
			
		Data steps which I mentioned in the intial post will create a variable(s) like ENTITY_ID, REPORTING... in case if it is not available in source dataset. However your code has ended with error as below. I forgot to mention this point in my post. How to tweak your code now to produce the desired result?
26         data test;
27         input REPORTING $;
28         datalines;
NOTE: The data set WORK.TEST has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds
      
30         ;
31         run;
32         
33         proc sql;
34             create table dis_entity_id as
35             select distinct ENTITY_ID format=$ENTITY.,
36                    REPORTING format=$50.
37             from test
38             order by ENTITY_ID, REPORTING;
ERROR: The following columns were not found in the contributing tables: ENTITY_ID.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
39         quit;
I guess the problem is of keeping all original variables but changing the format of just two columns. Can be done with:
proc sql;
create table dis_entity_id as
	select * from &library..&input_table
	order by ENTITY_ID, REPORTING;
alter table dis_entity_id 
	modify 
		ENTITY_ID CHARACTER format=$ENTITY.,
		REPORTING CHARACTER format=$50.;
quit;
Data steps which I mentioned in the intial post will create a variable(s) like ENTITY_ID, REPORTING... in case if it is not available in source dataset. However your code has ended with error as below. I forgot to mention this point in my post. How to tweak your code now to produce the desired result?
26         proc sql;
27         create table dis_entity_id as
28         	select * from test
29         	order by ENTITY_ID, REPORTING;
ERROR: The following columns were not found in the contributing tables: ENTITY_ID.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
30         alter table dis_entity_id
31         	modify
32         		ENTITY_ID CHARACTER format=$ENTITY.,
33         		REPORTING CHARACTER format=$50.;
NOTE: Statement not executed due to NOEXEC option.
34         quit;
					
				
			
			
				
			
			
			
			
			
			
			
		
@Babloo wrote:
Data steps which I mentioned in the intial post will create a variable(s) like ENTITY_ID, REPORTING... in case if it is not available in source dataset. However your code has ended with error as below. I forgot to mention this point in my post. How to tweak your code now to produce the desired result?
26 proc sql; 27 create table dis_entity_id as 28 select * from test 29 order by ENTITY_ID, REPORTING; ERROR: The following columns were not found in the contributing tables: ENTITY_ID. NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements. 30 alter table dis_entity_id 31 modify 32 ENTITY_ID CHARACTER format=$ENTITY., 33 REPORTING CHARACTER format=$50.; NOTE: Statement not executed due to NOEXEC option. 34 quit;
Did you change the FROM data set name to one you actually have with the appropriate data?
You didn't explain the PURPOSE of the data step.
From your later comments it sounds like the purpose is to force the creation of those two variables. Not sure why you would want to attach the $50. format to the second variable, especially if it could already exist and have a different length than 50 bytes.
Let's assume that the purpose is to create REPORTING as a character variable of length 50. You don't show how you want ENTITY_ID created (or the definition of the $ENTITY. format) so let's just assume it should be a character variable of length 3 and use the $CHAR3. format as place holder for the $ENTITY. format in the code. Here is code using OUTER UNION CORRESPONDING and pulling the NAME variable from SASHELP.CLASS to give SQL a source for the definition of the potential extra columns.
proc sql ;
create table dis_entity_id as 
select * from &library..&input_table
outer union corresponding 
select name as entity_id length=3 format=$char3.
     , name as reporting length=50 
from sashelp.class(obs=0)
order by entity_id, reporting
;
quit;
Note that is is much harder to implement the NODUPKEY functionality of PROC SORT in PROC SQL. Unless the table actually only has those two variables. In which case the query could then be something like this:
proc sql ;
create table dis_entity_id as 
select distinct
   entity_id format=$char3.
 , reporting 
from
(
select * from &library..&input_table
outer union corresponding 
select name as entity_id length=3 
     , name as reporting length=50 
from sashelp.class(obs=0)
)
order by entity_id, reporting
;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.