BookmarkSubscribeRSS Feed
Hang
Calcite | Level 5
Hello, i hope you can help

I have a set of data with 474 columns and approximately 38000 rows

But i only need data that satisfy ‘6000’ on column “SIC”

Ive done the following:

Libname mydata ‘directory path’
Data work.df;
Set mydata.nameoffile;
Run;

Proc sql;
Select columns
From mydata.nameoffiles
Where SIC = ‘6000’

Quit;

But the results is blank...please help.

Thank you v much
15 REPLIES 15
ed_sas_member
Meteorite | Level 14

Hi @Hang 

 

Welcome to the community!

 

What is the type of column "SIC": character or numeric?

If it is numeric, you need to remove quotation marks: Where SIC = 6000

 

In your code, you have some missing semi-colons and some misspelling in the dataset name. Here is the correction:

libname mydata ‘directory path’;

data work.df;

set mydata.nameoffile;

run;

 

proc sql;

select columns from mydata.nameoffiles where SIC = ‘6000’;

quit;

 

Hang
Calcite | Level 5

Hi, thank you for your reply

 

this is the copy of my full code
 
libname mydata '/folders/myfolders/Company data';
data work.banks;
 set mydata.uk;
 keep SIC CONML DCSFD DLC DLTT EBITDA;
 
run;
proc sql;
 select SIC, CONML
 from mydata.uk
 where SIC = 6000;
 
quit;
 
 
the first part works, but the second part doesnt. ===ERROR: Expression using equals (=) has components that are of different data types.
 
Kurt_Bremser
Super User

PLEASE(!!!) post the whole(!) log, and USE THE {i} BUTTON for this. DO NOT SKIP THIS.

 

It seems that SIC is of type character, and you need to use the quotes. Please change the code, run it, and post the complete log as requested.

Hang
Calcite | Level 5
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 72 73 libname mydata '/folders/myfolders/Company data'; NOTE: Libref MYDATA refers to the same physical library as _TEMP0. NOTE: Libref MYDATA was successfully assigned as follows: Engine: V9 Physical Name: /folders/myfolders/Company data 74 data work.banks; 75 set mydata.uk; NOTE: Data file MYDATA.UK.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance. 76 keep SIC CONML DCSFD DLC DLTT EBITDA; 77 78 run; NOTE: There were 38841 observations read from the data set MYDATA.UK. NOTE: The data set WORK.BANKS has 38841 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 41.40 seconds cpu time 28.60 seconds 79 80 proc sql; 81 select SIC, CONML 82 from mydata.uk 83 where SIC = '6000'; NOTE: Data file MYDATA.UK.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance. NOTE: No rows were selected. 84 85 quit; NOTE: PROCEDURE SQL used (Total process time): real time 8.47 seconds cpu time 3.38 seconds 86 87 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 99
Hang
Calcite | Level 5

@Kurt_Bremser 

 

full log

 

 
1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72        
73 libname mydata '/folders/myfolders/Company data';
NOTE: Libref MYDATA refers to the same physical library as _TEMP0.
NOTE: Libref MYDATA was successfully assigned as follows:
       Engine:        V9
       Physical Name: /folders/myfolders/Company data
74         data work.banks;
75         set mydata.uk;
NOTE: Data file MYDATA.UK.DATA is in a format that is native to another host, or the file encoding does not match the session
       encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
       performance.
76         keep SIC CONML DCSFD DLC DLTT EBITDA;
77        
78         run;
 
NOTE: There were 38841 observations read from the data set MYDATA.UK.
NOTE: The data set WORK.BANKS has 38841 observations and 6 variables.
NOTE: DATA statement used (Total process time):
       real time           41.40 seconds
       cpu time            28.60 seconds
      
 
79        
80         proc sql;
81         select SIC, CONML
82         from mydata.uk
83         where SIC = '6000';
NOTE: Data file MYDATA.UK.DATA is in a format that is native to another host, or the file encoding does not match the session
       encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
       performance.
NOTE: No rows were selected.
84        
85         quit;
NOTE: PROCEDURE SQL used (Total process time):
       real time           8.47 seconds
       cpu time            3.38 seconds
      
 
86        
87         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
99
 
 
Kurt_Bremser
Super User

So it's a clear case for Maxim 3: Know Your Data.

First, run a proc contents on MYDATA.UK to see the length and format of SIC (if a format is assigned).

Then, create a quick sample:

data test;
set mydata.uk (keep=sic obs=100);
checkvar = put(sic,$HEXxx.);
run;

where "xx" is double the defined length of SIC. You can then inspect the hex values to see if you have special characters, leading blanks, etc.

Hang
Calcite | Level 5
CODE:
libname mydata '/folders/myfolders/Company data';
data work.banks;
 set mydata.uk;
 keep SIC CONML DCSFD DLC DLTT EBITDA;
 
 
proc contents data=mydata.uk;

data mydata.uk;
 set mydata.uk;
 if SIC= '6000' then output; 
run;
 
LOG:
 
1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72        
73 libname mydata '/folders/myfolders/Company data';
NOTE: Libref MYDATA refers to the same physical library as _TEMP0.
NOTE: Libref MYDATA was successfully assigned as follows:
       Engine:        V9
       Physical Name: /folders/myfolders/Company data
74         data work.banks;
75         set mydata.uk;
76         keep SIC CONML DCSFD DLC DLTT EBITDA;
77        
78        
79        
 
NOTE: There were 0 observations read from the data set MYDATA.UK.
NOTE: The data set WORK.BANKS has 0 observations and 6 variables.
NOTE: DATA statement used (Total process time):
       real time           0.07 seconds
       cpu time            0.04 seconds
      
 
80         proc contents data=mydata.uk;
81        
82        
 
NOTE: PROCEDURE CONTENTS used (Total process time):
       real time           9.37 seconds
       cpu time            9.28 seconds
      
 
83         data mydata.uk;
84         set mydata.uk;
85         if SIC= '6000' then output;
86         run;
 
NOTE: There were 0 observations read from the data set MYDATA.UK.
NOTE: The data set MYDATA.UK has 0 observations and 474 variables.
NOTE: DATA statement used (Total process time):
       real time           0.14 seconds
       cpu time            0.11 seconds
      
 
87        
88         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
100
 
 
My results is still blank, can you let me know if the coding is correct. because otherwise, there would be no results that satisfy the condition
thanks:)
Kurt_Bremser
Super User

The proc contents is not meant to immediately solve the problem, but its output allows you to set up further steps to get to know your data.

 

What does the output of proc contents say about column SIC?

Hang
Calcite | Level 5

@Kurt_Bremserit says that the variable SIC is character 

 

yeah I figured proc contents is not the solution. This is the first time I used this so I'm not sure what to do exactly. so after going youtube and taking in you guys' suggestions

 

this is what I've got so far:

CODE

libname mydata '/folders/myfolders/Company data';
data work.banks;
 set mydata.UK_Quarterly;
 keep SIC CONML;

proc freq data=mydata.uk;
  tables work.banks;
  where SIC =: '6000';
data mydata.uk;
 set mydata.uk;
 if SIC= '6000' then output; 
run;
 
LOG
1          OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72        
73         libname mydata '/folders/myfolders/Company data';
NOTE: Libref MYDATA was successfully assigned as follows:
       Engine:        V9
       Physical Name: /folders/myfolders/Company data
74         data work.banks;
75         set mydata.UK_Quarterly;
NOTE: Data file MYDATA.UK_QUARTERLY.DATA is in a format that is native to another host, or the file encoding does not match the
       session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
       performance.
76         keep SIC CONML;
77        
78        
 
NOTE: There were 135194 observations read from the data set MYDATA.UK_QUARTERLY.
NOTE: The data set WORK.BANKS has 135194 observations and 2 variables.
NOTE: DATA statement used (Total process time):
       real time           24.62 seconds
       cpu time            10.96 seconds
      
 
79         proc freq data=mydata.uk;
80           tables work.banks;
                     __________
                     22
                     201
ERROR 22-322: Syntax error, expecting one of the following: a name, (, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_. 
ERROR 201-322: The option is not recognized and will be ignored.
81           where SIC =: '6000';
ERROR: Variable NAME not found.
82        
 
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE FREQ used (Total process time):
       real time           0.01 seconds
       cpu time            0.00 seconds
      
 
 
83         data mydata.uk;
84         set mydata.uk;
85         if SIC= '6000' then output;
86         run;
 
NOTE: There were 0 observations read from the data set MYDATA.UK.
NOTE: The data set MYDATA.UK has 0 observations and 474 variables.
NOTE: DATA statement used (Total process time):
       real time           0.06 seconds
       cpu time            0.02 seconds
      
 
87        
88         OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
100       
 
DO you know if there's a website that do SAS for me?

 

 

 

Kurt_Bremser
Super User

You have to look at the OUTPUT, not the log from proc contents. It will tell you the defined length of SIC, and if a format (other than just $w.) is assigned to it.

Then you can run the other step I showed you, that helps you in getting a grasp on the REAL values of SIC.

 

And while you do this, follow my Maxim #1 and read up on the documentation of all these tools we show you, to familiarize yourself with them and how they have to be used.

Astounding
PROC Star

Very likely, the data does not behave as expected.  You have no "6000" values in your data set.  Here's a way to check, while examining your data a bit:

 

proc freq data=mydata.uk;
  tables SIC;
  where SIC =: '60';
run;

Remember to include the colon in the =: comparison.

 

Edited:  Note the change, using "60" instead of "6000".

Hang
Calcite | Level 5

@AstoundingI've done that and the results still came out blank, does it mean that I have nothing satisfy the condition?

Astounding
PROC Star

Yes.  There should be a note on the log stating that zero observations were selected.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 15 replies
  • 2693 views
  • 0 likes
  • 4 in conversation