DATA Step, Macro, Functions and more

Sorting data, numerous times- How would you try this?

Accepted Solution Solved
Reply
Contributor
Posts: 33
Accepted Solution

Sorting data, numerous times- How would you try this?

Ok, i have to create a .csv file that will list incomes for women in list format from 10-20k, 20k-30k and so on until 100k or more. It is in two columns, one is full time and the other is part time. Their wages are random entries in the cells. How would you incorporate this design? Or better yet how would you write out the code for this project?

 

Release: 3.5 (Enterprise Edition) of SAS

 

Write a SAS® program to access the Microsoft®Excel®spreadsheet and select the approximately 20 observations for the total population estimate of females that worked full-time and year-round.

Create a variable that contains the maximum earning in the Microsoft® Excel® spreadsheet Column A labels. As an example, for label $2,500 to $4,999, the variable would include value 4,999. This is in addition to the variable that contains the associated population number.

Store this data in a raw data file. Sort the data from highest amount earned to lowest amount earned. Store this sorted data in a second raw data file. Perform another sort to order the data from highest population number to lowest population number. Store this sorted data in a third raw data file. Keep the two related data variables associated so that after each sort the amount earned is associated with the population number or the number of people that earned that amount.


Accepted Solutions
Solution
‎02-20-2017 08:29 PM
Contributor
Posts: 33

Re: Sorting data, numerous times- How would you try this?

Data Calculation;
infile '.csv' DSD Missover firstobs=2;
Input Gender $ Wages_Low Wages_High Estimated Max_Earnings;

Proc Print Data=Calculation;
run;


then a program for sort 2


Data Calculation;
infile '.csv' DSD Missover firstobs=2;
Input Gender $ Wages_Low Wages_High Estimated Max_Earnings;

Proc Sort Data=Calculation Out=Sort2;
By Descending Max_Earnings;
Run;

and sort 3...


Data Calculation;
infile '.csv' DSD Missover firstobs=2;
Input Gender $ Salary_Low Salary_High Estimate Max_Earnings;

Proc Sort Data=Calculation Out=Sort3;
Key Estimated / descending;
Run;


Would that make 3 tables, showing the requested information?


If i make my excel file like this...


Gender- Salary Low End- Salary High End- Esitmates- Maximum Earnings

female            1                    1                        1                       1

female           5                    7                         6                        8


etc. down to the 20 requested


Will that work?

View solution in original post


All Replies
Trusted Advisor
Posts: 1,018

Re: Sorting data, numerous times- How would you try this?


SixUnder wrote:

...  How would you incorporate this design? Or better yet how would you write out the code for this project?

 

How much credit for this homework?

Contributor
Posts: 33

Re: Sorting data, numerous times- How would you try this?

I asked for approaches, not the answers. As in pseudocode, or a layout. Perhaps my lingo is not as special as the rest of people who work with SAS everyday, I could of written in different. I am trying to understand and the learn the concepts, and the syntax in a short time frame. Please no trolling.

Trusted Advisor
Posts: 1,018

Re: Sorting data, numerous times- How would you try this?

I appreciate your earnestness.  What strategies have you contemplated?

Contributor
Posts: 33

Re: Sorting data, numerous times- How would you try this?

[ Edited ]

Of course, I am very new to SAS, so the basic principles of it are still not my forteit. I want to lay this our like any other language but this problem seems complex to me. I understand the basic of course.

 

Data step

input step

 

possible options

library admin

data sorting for display- one set of values

 

and then of course

 

proc print data etc

 

This question is asking for so much, and i cant translate it into the steps one at a time for the program, at least not in my head.

 

1. create a variable that contains the max earning for column A

(it adds this curveball in there, As an example, for label $2,500 to $4,999, the variable would include value 4,999. This is in addition to the variable that contains the associated population number)

 

2. store data in a raw file, does this mean in the program? or in another .csv file?

 

3. sort the data from highest earned to lowest money earned

 

4. perform a additional sort for highest to lowest population

 

5. store this in a 3rd raw data file(again no clue there)

 

6. keep the two data variables associated( it only asked for one) so after each sort the amount earned matched the poplulation number.

 

I guess those are my steps, at least once i create and upload a .csv file, i just dont know what im doing, or im in over my head.

Super User
Posts: 19,768

Re: Sorting data, numerous times- How would you try this?

Proc sort for sorting. You can use an OUT dataset to keep a unique copy though it seems weird. 

 

Proc sort data=SASHELP.class out=class_age;

by age;

run;

 

Honestly without seeing the data I'm not sure we understand all the requirements. Rather than tackle all the requirements, start the first one and then move on from there. If you can get your pseudocode, then here has examples on many of the common tasks. 

 

http://support.sas.com/training/tutorial/

 

Also, I'm not sure I'd call the comments trolling. You literally copied and pasted your question into a forum and asked 'how would you write out the code'. That isn't a request for methods or pseudocode in my book either. 

Super User
Posts: 5,424

Re: Sorting data, numerous times- How would you try this?

I think that the instructions is as close to pseudo code it can get, all step are described there.
So what's missing is basic SAS programming skills which your training should serve you...But if that isn't enough for you take the free online Programming 1 course.
Data never sleeps
Super User
Posts: 11,343

Re: Sorting data, numerous times- How would you try this?

A certain amout of ordering can be done within procedure output for reports so "sorting" may not be needed.

Also Formats are a very good way to associate specific values of a variable with a display category so keeping a category with a value is trivial.

 

I wouldn't even worry about "sorting" until I knew what the desired result looked like. The "sort" instructions look like something I might expect from a spreadsheet user who is used to having to physically reorder data at each step of process. Attempting to duplicate the entire process step-by-step of a spreadsheet process often leads to extremely convoluted and hard to maintain SAS code.

Consider: ( you should have the sashelp.class data set available)

proc tabulate data=sashelp.class;
   class sex/ order=freq;
   class age/order=data;
   var height;
   tables sex*age,
          n height*(min max mean);
run;

proc tabulate data=sashelp.class;
   class sex/ order=formatted;
   class age/order=freq;
   var height;
   tables sex*age,
          n height*(min max mean);
run;

Results in different output order for the two tables WITHOUT actually "sorting" the data set.

 

Formats can control appearance order, order of appearance in a data set, frequency of a value, the actual value and for some procedures you may get other options.

Contributor
Posts: 33

Re: Sorting data, numerous times- How would you try this?

here is the data in a comma delimited format. Just so everyone can see what I am working with.

Super User
Posts: 19,768

Re: Sorting data, numerous times- How would you try this?

[ Edited ]

Having seen your data, here's the pseudo code and the pointers to the relevant SAS way to do it. I'm not writing code. This assumes you've already imported the data into SAS in the first place successfully.

 



 

Write a SAS® program to access the Microsoft®Excel®spreadsheet and select the approximately 20 observations for the total population estimate of females that worked full-time and year-round.

 


1. Filter the data for only the variable of interest since your data has multiple measurements in the same column. This would be a WHERE or IF/THEN statement in a data step.

 


Create a variable that contains the maximum earning in the Microsoft® Excel® spreadsheet Column A labels. As an example, for label $2,500 to $4,999, the variable would include value 4,999. This is in addition to the variable that contains the associated population number.

 


Recode the variable so that you have a numeric value instead of the value 5000- 7499. This is likely so you can sort it properly since text sorts alphabetically.  You can accomplish this with an IF/THEN statement, ideally in the same data step as the step above.

 


Store this data in a raw data file. Sort the data from highest amount earned to lowest amount earned. Store this sorted data in a second raw data file. Perform another sort to order the data from highest population number to lowest population number. Store this sorted data in a third raw data file. Keep the two related data variables associated so that after each sort the amount earned is associated with the population number or the number of people that earned that amount.


 

These are the proc sorts I initially mentioned in my first answer. 

 

How to accomplish step 1/2 are also in the first link I posted with the YouTube videos.

Good Luck. 

 

PROC Star
Posts: 7,467

Re: Sorting data, numerous times- How would you try this?

While it at first sounds like a homework assignment, it is more difficult than any homework assignment I've seen.

 

Here is how I would create the initial dataset. Your on your own, from there, determining how to answer the rest of the questions:

 

proc import datafile="c:\art\data.csv" out=datain replace dbms=csv;
  getnames=no;
  datarow=4;
  guessingrows=32767;
run;

data need (drop=var2 var3);
  set datain (rename=(var1=geo_id));
  if substr(var2,1,89)='Estimate; Female: - Worked full-time, year-round in the past 12 months: - With income: -';
  if substr(reverse(strip(var2)),1,7) eq 'erom ro' then salary=100000;
  else if substr(reverse(strip(var2)),1,7) eq 'ssol ro' then salary=2499;
  else salary=input(substr(var2,index(var2,'to ')+4),comma8.);
  count=input(var3,8.);
run;

HTH,

Art, CEO, AnalystFinder.com

 

Super User
Posts: 19,768

Re: Sorting data, numerous times- How would you try this?


art297 wrote:

While it at first sounds like a homework assignment, it is more difficult than any homework assignment I've seen.

 

HTH,

Art, CEO, AnalystFinder.com

 


Not for a programming or analysis course. This should be expected as base level for any entry level analyst. It covers the base requirements IMO, so another option is, it could be an 'homework' task for a hiring process. All analysts here get hired after completing a programming task. In the last one, the person who got hired passed the first step...the rest couldn't read in the data properly Woman Frustrated We don't dictate the language though, because its the process of problem solving that's more important than specifics. 

 

1. Import a data set

2. Create new variables

3. Produce a list report

4. Produce a summary report (ie proc freq/means)

 

 

PROC Star
Posts: 7,467

Re: Sorting data, numerous times- How would you try this?

@Reeza: You may be right! Conversely, I've encountered many employed folks, over the years, who needed some help solving a problem of this complexity. On the other hand, knowing where to go and what to ask (and, as a result, being able to solve a problem), isn't a bad skill in itself Smiley Happy

 

Art, CEO, AnalystFinder.com

 

Super User
Posts: 19,768

Re: Sorting data, numerous times- How would you try this?

@art297 True. I think the ability to program is almost expected from a University graduate these days now.

Contributor
Posts: 33

Re: Sorting data, numerous times- How would you try this?

I am going to try and create the data step, the input and possibly the variables and go from there. I will come back here and post once i get that far. Whos knows, you are all much more advanced than I am.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 23 replies
  • 242 views
  • 1 like
  • 6 in conversation