BookmarkSubscribeRSS Feed
potiu
Fluorite | Level 6

Hi,

 

I am facing problem after importing data in excel format.

 

Below is my procedure to access the saved data in order to start work on my dataset:

 

Each time I log in to my SAS Studio, I need to click on Servers Files and Folders and click on the saved data either in .xlsx or .sas format to run it.  As refer to the attachment label as 1, the output file I have run as work.cervicalcancer.

 

However, when I click on the Library -> Work -> cervicalcancer to run it, the Proc Datasets and Proc Prints statements show only work.query and work.details respectively (as shown as the picture below).

Thus, why it has different work file name appear instead only work.cervicalcancer? 

Thank you for your reply.

 

 

2.png

10 REPLIES 10
Reeza
Super User

I don't know what your question is. If you want to save the file permanently after importing it, save it to a library.

 

libname myFiles '/folders/myfolders/';

data myFiles.CervicalCancer;
set CervicalCancer;
run;

This copies the file to the location above. So the next time you want to use the dataset you assign the libname and you have the file available. 

potiu
Fluorite | Level 6
Thanks Reeza for your reply.

My problem is I do not understand why there are 2 different work names generated that is work.query and work.details, in the Work folder of Libraries as shown in the picture of my previous post message.

Can you help me by explaining further.

Thank you.
Reeza
Super User

Please post your code as text. I can't read the image and it doesn't scale. 

 

You have no comments in your code, comment each procedure with what you think/expect it to be doing.

potiu
Fluorite | Level 6
Hi Reeza,

The code run the imported data in Servers, files and folders and in the work folder of Libraries is as follow:

PROC SQL;
CREATE TABLE WORK.query AS
SELECT Age , Number_of_sexual_partners , 'First_sexual_intercourse_(age)'n , Number_of_pregnancies , Smokes , 'Smokes_(years)'n , 'Smokes_(packs/year)'n , Hormonal_Contraceptives , 'Hormonal_Contraceptives_(years)'n , IUD , 'IUD_(years)'n , STDs , 'STDs_(how_many?)'n , 'STDs:condylomatosis'n , 'STDs:cervical_condylomatosis'n , 'STDs:vaginal_condylomatosis'n , 'STDs:vulvo-perineal_condylomatos'n , 'STDs:syphilis'n , 'STDs:pelvic_inflammatory_disease'n , 'STDs:genital_herpes'n , 'STDs:molluscum_contagiosum'n , 'STDs:AIDS'n , 'STDs:HIV'n , 'STDs:Hepatitis_B'n , 'STDs:HPV'n , 'STDs: Number_of_diagnosis'n , 'STDs:Year_since_first_diagnosis'n , 'STDs:Year_since_last_diagnosis'n , 'Previous_Dx:Cancer'n , 'Previous_Dx:CIN'n FROM WORK.CERVICALCANCER;
RUN;
QUIT;

PROC DATASETS NOLIST NODETAILS;
CONTENTS DATA=WORK.query OUT=WORK.details;
RUN;

PROC PRINT DATA=WORK.details;
RUN;
Reeza
Super User

Comments really really really help here.

 

%*Creates a table called QUERY with the data from Cervical Cancer. There's no WHERE clause so there's no filter on the number of rows but perhaps it's selecting less columns than the main data set. If not, it's not actually doing anything here besides creating a duplicate dataset. 

PROC SQL;
CREATE TABLE WORK.query AS
SELECT Age , Number_of_sexual_partners , 'First_sexual_intercourse_(age)'n , Number_of_pregnancies , Smokes , 'Smokes_(years)'n , 'Smokes_(packs/year)'n , Hormonal_Contraceptives , 'Hormonal_Contraceptives_(years)'n , IUD , 'IUD_(years)'n , STDs , 'STDs_(how_many?)'n , 'STDs:condylomatosis'n , 'STDs:cervical_condylomatosis'n , 'STDs:vaginal_condylomatosis'n , 'STDs:vulvo-perineal_condylomatos'n , 'STDs:syphilis'n , 'STDsSmiley Tongueelvic_inflammatory_disease'n , 'STDs:genital_herpes'n , 'STDs:molluscum_contagiosum'n , 'STDs:AIDS'n , 'STDs:HIV'n , 'STDs:Hepatitis_B'n , 'STDs:HPV'n , 'STDs: Number_of_diagnosis'n , 'STDs:Year_since_first_diagnosis'n , 'STDs:Year_since_last_diagnosis'n , 'Previous_Dx:Cancer'n , 'Previous_Dx:CIN'n FROM WORK.CERVICALCANCER;
RUN;
QUIT;

%*Run a contents on the QUERY dataset from previous step to see field types and formats. The field names are stored in a dataset called Details;
PROC DATASETS NOLIST NODETAILS;
CONTENTS DATA=WORK.query OUT=WORK.details;
RUN;

%*Displays the output from proc datasets - which is the details dataset.;
PROC PRINT DATA=WORK.details;
RUN;
potiu
Fluorite | Level 6
Thanks Reeza for your explanation.

I still have 2 questions regarding this topic:
1. What does the OUT function means in the Proc Datasets statement?

2. The mentioned SAS codes from my previous post are generated by SAS after Run the code of the SAS code generated by SAS as well by importing my dataset. The codes is as follow:

FILENAME REFFILE '/home/alvinyong190/sasuser.v94/ecprg193/cervicalcancer.xlsx';

PROC IMPORT DATAFILE=REFFILE
DBMS=XLSX
OUT=WORK.cervicalcancer;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=WORK.cervicalcancer; RUN;

My questions is why the code for work.cervicalcancer is not able to link to the code generated in Work file of Libraries directory?

Thank you.
Reeza
Super User

1. OUT= in a SAS procedure means a data set will be created. What exactly is in that data set depends on the PROC. Note that this is for SAS PROCEDURES, not a Data Step.   For PROC DATASETS you can see the OUT=option on the CONTENTS statement.


Digging through the documetentation a bit, yields the following:

http://support.sas.com/documentation/cdl/en/proc/70377/HTML/default/viewer.htm#p1sy9ca8n2tv03n1savk4...

 

In general, the OUT will be noted in the documentation and there will be information on what's in the dataset.

 

2. I don't know what you're asking here or what you're expecting. It's not clear at all. 

potiu
Fluorite | Level 6
Hi Reeza,

For the question number 2,

The SAS codes starting with the Filename Refile ... is generated after imported the dataset in Excel to SAS. Refer to the code itself, SAS created a work file in the Libraries directories under Work Folder called cervicalcancer.

When I clicked on the cervicalcancer file of Work under the Libraries directory, the SAS codes inside the file is starting as Proc SQL ....

My question is why the first file name as work.cervicalcancer while the second file name contains work.query and work.details instead of having work.cervicalcancer in both files?

Thank you.
Reeza
Super User

Read the comments again. 

Identify the input data sets for each step and the output data sets. 

 

 

Details is a new dataset that has very different content than Cervical Cancer - it has data about the cervical cancer dataset. Open it. 

 

Query is from the SQL procedure. You ran it, so why did you do that? It creates a new table name because the table isn't usually identical to the source as it is in this case. You can easily add new variables or create summary statistics in the procedure so it creates a new table name. You have control over the output name and since it's Query, my guess is you didn't change the name. 

 

 

 

 

potiu
Fluorite | Level 6
Thanks Reeza for your explanation.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1686 views
  • 0 likes
  • 2 in conversation