BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
venkatnaveen
Obsidian | Level 7

DATA Description

We have an Excel Sheet in that we have 1,000,000 Observations and 150 Variables.

Some of the Records(Entire Row) have Missing values.

TASK

Import the Excel Sheet into SAS Data Set with the best available technique . Remove all Missing Records in the SAS Data Set(If possible import only Non Missing Records).
Now we need to create Two SAS Date Sets "ONE" and "TWO".
Data Set "ONE" should have all the Character Variables with 25,000 Random Records.


Data Set "TWO" should have all the Numeric Variables with 50,000 Random Records.

1 ACCEPTED SOLUTION

Accepted Solutions
jakarman
Barite | Level 11

I this a real-life question? 

Excel I not a professional tool for date-entry. Your first question is about data-cleansing.

The best way is converting it from excel to a text/csv file. Reading this file with input statements eliminates all uncertainty about "guessing rows" used in those entry level import approaches. An all missing fields can be dealt with. Numerics/chars are well defined as you are coding those. Your brain has to decide.

A split with one/two dataset looks to be the validate/training approach of data mining.  But that split in numeric/char is a weird one. In the digitial age this is one to think first of what is the meaning. You should think in category/measure and by that some specials of that with date/time and geography.    

For the technical part in SAS there are variable lists. SAS(R) 9.4 Language Reference: Concepts, Third Edition there are special ones as _numeric_ and _char_.

When you want to be technical SAS stores those physical grouped like that ann not in the logical order of creation.

---->-- ja karman --<-----

View solution in original post

2 REPLIES 2
jakarman
Barite | Level 11

I this a real-life question? 

Excel I not a professional tool for date-entry. Your first question is about data-cleansing.

The best way is converting it from excel to a text/csv file. Reading this file with input statements eliminates all uncertainty about "guessing rows" used in those entry level import approaches. An all missing fields can be dealt with. Numerics/chars are well defined as you are coding those. Your brain has to decide.

A split with one/two dataset looks to be the validate/training approach of data mining.  But that split in numeric/char is a weird one. In the digitial age this is one to think first of what is the meaning. You should think in category/measure and by that some specials of that with date/time and geography.    

For the technical part in SAS there are variable lists. SAS(R) 9.4 Language Reference: Concepts, Third Edition there are special ones as _numeric_ and _char_.

When you want to be technical SAS stores those physical grouped like that ann not in the logical order of creation.

---->-- ja karman --<-----
ballardw
Super User

Since this looks sort of like homework instead of a complete solution for the second part of the question:

Look at proc surveyselect, using a KEEP dataset option of _numeric_ or _character_ on the input data set.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2531 views
  • 2 likes
  • 3 in conversation