BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SixUnder
Obsidian | Level 7

Ok, i have to create a .csv file that will list incomes for women in list format from 10-20k, 20k-30k and so on until 100k or more. It is in two columns, one is full time and the other is part time. Their wages are random entries in the cells. How would you incorporate this design? Or better yet how would you write out the code for this project?

 

Release: 3.5 (Enterprise Edition) of SAS

 

Write a SAS® program to access the Microsoft®Excel®spreadsheet and select the approximately 20 observations for the total population estimate of females that worked full-time and year-round.

Create a variable that contains the maximum earning in the Microsoft® Excel® spreadsheet Column A labels. As an example, for label $2,500 to $4,999, the variable would include value 4,999. This is in addition to the variable that contains the associated population number.

Store this data in a raw data file. Sort the data from highest amount earned to lowest amount earned. Store this sorted data in a second raw data file. Perform another sort to order the data from highest population number to lowest population number. Store this sorted data in a third raw data file. Keep the two related data variables associated so that after each sort the amount earned is associated with the population number or the number of people that earned that amount.

1 ACCEPTED SOLUTION

Accepted Solutions
SixUnder
Obsidian | Level 7
Data Calculation;
infile '.csv' DSD Missover firstobs=2;
Input Gender $ Wages_Low Wages_High Estimated Max_Earnings;

Proc Print Data=Calculation;
run;


then a program for sort 2


Data Calculation;
infile '.csv' DSD Missover firstobs=2;
Input Gender $ Wages_Low Wages_High Estimated Max_Earnings;

Proc Sort Data=Calculation Out=Sort2;
By Descending Max_Earnings;
Run;

and sort 3...


Data Calculation;
infile '.csv' DSD Missover firstobs=2;
Input Gender $ Salary_Low Salary_High Estimate Max_Earnings;

Proc Sort Data=Calculation Out=Sort3;
Key Estimated / descending;
Run;


Would that make 3 tables, showing the requested information?


If i make my excel file like this...


Gender- Salary Low End- Salary High End- Esitmates- Maximum Earnings

female            1                    1                        1                       1

female           5                    7                         6                        8


etc. down to the 20 requested


Will that work?

View solution in original post

23 REPLIES 23
mkeintz
PROC Star

@SixUnder wrote:

...  How would you incorporate this design? Or better yet how would you write out the code for this project?

 

How much credit for this homework?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SixUnder
Obsidian | Level 7

I asked for approaches, not the answers. As in pseudocode, or a layout. Perhaps my lingo is not as special as the rest of people who work with SAS everyday, I could of written in different. I am trying to understand and the learn the concepts, and the syntax in a short time frame. Please no trolling.

mkeintz
PROC Star

I appreciate your earnestness.  What strategies have you contemplated?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SixUnder
Obsidian | Level 7

Of course, I am very new to SAS, so the basic principles of it are still not my forteit. I want to lay this our like any other language but this problem seems complex to me. I understand the basic of course.

 

Data step

input step

 

possible options

library admin

data sorting for display- one set of values

 

and then of course

 

proc print data etc

 

This question is asking for so much, and i cant translate it into the steps one at a time for the program, at least not in my head.

 

1. create a variable that contains the max earning for column A

(it adds this curveball in there, As an example, for label $2,500 to $4,999, the variable would include value 4,999. This is in addition to the variable that contains the associated population number)

 

2. store data in a raw file, does this mean in the program? or in another .csv file?

 

3. sort the data from highest earned to lowest money earned

 

4. perform a additional sort for highest to lowest population

 

5. store this in a 3rd raw data file(again no clue there)

 

6. keep the two data variables associated( it only asked for one) so after each sort the amount earned matched the poplulation number.

 

I guess those are my steps, at least once i create and upload a .csv file, i just dont know what im doing, or im in over my head.

Reeza
Super User

Proc sort for sorting. You can use an OUT dataset to keep a unique copy though it seems weird. 

 

Proc sort data=SASHELP.class out=class_age;

by age;

run;

 

Honestly without seeing the data I'm not sure we understand all the requirements. Rather than tackle all the requirements, start the first one and then move on from there. If you can get your pseudocode, then here has examples on many of the common tasks. 

 

http://support.sas.com/training/tutorial/

 

Also, I'm not sure I'd call the comments trolling. You literally copied and pasted your question into a forum and asked 'how would you write out the code'. That isn't a request for methods or pseudocode in my book either. 

LinusH
Tourmaline | Level 20
I think that the instructions is as close to pseudo code it can get, all step are described there.
So what's missing is basic SAS programming skills which your training should serve you...But if that isn't enough for you take the free online Programming 1 course.
Data never sleeps
ballardw
Super User

A certain amout of ordering can be done within procedure output for reports so "sorting" may not be needed.

Also Formats are a very good way to associate specific values of a variable with a display category so keeping a category with a value is trivial.

 

I wouldn't even worry about "sorting" until I knew what the desired result looked like. The "sort" instructions look like something I might expect from a spreadsheet user who is used to having to physically reorder data at each step of process. Attempting to duplicate the entire process step-by-step of a spreadsheet process often leads to extremely convoluted and hard to maintain SAS code.

Consider: ( you should have the sashelp.class data set available)

proc tabulate data=sashelp.class;
   class sex/ order=freq;
   class age/order=data;
   var height;
   tables sex*age,
          n height*(min max mean);
run;

proc tabulate data=sashelp.class;
   class sex/ order=formatted;
   class age/order=freq;
   var height;
   tables sex*age,
          n height*(min max mean);
run;

Results in different output order for the two tables WITHOUT actually "sorting" the data set.

 

Formats can control appearance order, order of appearance in a data set, frequency of a value, the actual value and for some procedures you may get other options.

SixUnder
Obsidian | Level 7

here is the data in a comma delimited format. Just so everyone can see what I am working with.

Reeza
Super User

Having seen your data, here's the pseudo code and the pointers to the relevant SAS way to do it. I'm not writing code. This assumes you've already imported the data into SAS in the first place successfully.

 



 

Write a SAS® program to access the Microsoft®Excel®spreadsheet and select the approximately 20 observations for the total population estimate of females that worked full-time and year-round.

 


1. Filter the data for only the variable of interest since your data has multiple measurements in the same column. This would be a WHERE or IF/THEN statement in a data step.

 


Create a variable that contains the maximum earning in the Microsoft® Excel® spreadsheet Column A labels. As an example, for label $2,500 to $4,999, the variable would include value 4,999. This is in addition to the variable that contains the associated population number.

 


Recode the variable so that you have a numeric value instead of the value 5000- 7499. This is likely so you can sort it properly since text sorts alphabetically.  You can accomplish this with an IF/THEN statement, ideally in the same data step as the step above.

 


Store this data in a raw data file. Sort the data from highest amount earned to lowest amount earned. Store this sorted data in a second raw data file. Perform another sort to order the data from highest population number to lowest population number. Store this sorted data in a third raw data file. Keep the two related data variables associated so that after each sort the amount earned is associated with the population number or the number of people that earned that amount.


 

These are the proc sorts I initially mentioned in my first answer. 

 

How to accomplish step 1/2 are also in the first link I posted with the YouTube videos.

Good Luck. 

 

art297
Opal | Level 21

While it at first sounds like a homework assignment, it is more difficult than any homework assignment I've seen.

 

Here is how I would create the initial dataset. Your on your own, from there, determining how to answer the rest of the questions:

 

proc import datafile="c:\art\data.csv" out=datain replace dbms=csv;
  getnames=no;
  datarow=4;
  guessingrows=32767;
run;

data need (drop=var2 var3);
  set datain (rename=(var1=geo_id));
  if substr(var2,1,89)='Estimate; Female: - Worked full-time, year-round in the past 12 months: - With income: -';
  if substr(reverse(strip(var2)),1,7) eq 'erom ro' then salary=100000;
  else if substr(reverse(strip(var2)),1,7) eq 'ssol ro' then salary=2499;
  else salary=input(substr(var2,index(var2,'to ')+4),comma8.);
  count=input(var3,8.);
run;

HTH,

Art, CEO, AnalystFinder.com

 

Reeza
Super User

@art297 wrote:

While it at first sounds like a homework assignment, it is more difficult than any homework assignment I've seen.

 

HTH,

Art, CEO, AnalystFinder.com

 


Not for a programming or analysis course. This should be expected as base level for any entry level analyst. It covers the base requirements IMO, so another option is, it could be an 'homework' task for a hiring process. All analysts here get hired after completing a programming task. In the last one, the person who got hired passed the first step...the rest couldn't read in the data properly Woman Frustrated We don't dictate the language though, because its the process of problem solving that's more important than specifics. 

 

1. Import a data set

2. Create new variables

3. Produce a list report

4. Produce a summary report (ie proc freq/means)

 

 

art297
Opal | Level 21

@Reeza: You may be right! Conversely, I've encountered many employed folks, over the years, who needed some help solving a problem of this complexity. On the other hand, knowing where to go and what to ask (and, as a result, being able to solve a problem), isn't a bad skill in itself 🙂

 

Art, CEO, AnalystFinder.com

 

Reeza
Super User

@art297 True. I think the ability to program is almost expected from a University graduate these days now.

SixUnder
Obsidian | Level 7

I am going to try and create the data step, the input and possibly the variables and go from there. I will come back here and post once i get that far. Whos knows, you are all much more advanced than I am.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 1996 views
  • 1 like
  • 6 in conversation