Hello everyone, I am new to SAS programming, and working on a personal project to analyze used cars listing on Craigslist between April and May 2021. Here is the dataset that I am using: https://www.kaggle.com/austinreese/craigslist-carstrucks-data I found a similar blog post on the towards data science website (https://towardsdatascience.com/end-to-end-data-science-project-predicting-used-car-prices-using-regression-1b12386c69c8) where the author, Jose Portillo, used an older dataset with the same variables. However, Jose coded his project in Python whereas I am doing mine in SAS. import re
manufacturer = '(gmc | hyundai | toyota | mitsubishi | ford | chevrolet | ram | buick | jeep | dodge | subaru | nissan | audi | rover | lexus \
| honda | chrysler | mini | pontiac | mercedes-benz | cadillac | bmw | kia | volvo | volkswagen | jaguar | acura | saturn | mazda | \
mercury | lincoln | infiniti | ferrari | fiat | tesla | land rover | harley-davidson | datsun | alfa-romeo | morgan | aston-martin | porche \
| hennessey)'
condition = '(excellent | good | fair | like new | salvage | new)'
fuel = '(gas | hybrid | diesel |electric)'
title_status = '(clean | lien | rebuilt | salvage | missing | parts only)'
transmission = '(automatic | manual)'
drive = '(4x4 | awd | fwd | rwd | 4wd)'
size = '(mid-size | full-size | compact | sub-compact)'
type_ = '(sedan | truck | SUV | mini-van | wagon | hatchback | coupe | pickup | convertible | van | bus | offroad)'
paint_color = '(red | grey | blue | white | custom | silver | brown | black | purple | green | orange | yellow)'
cylinders = '(\s[1-9] cylinders? |\s1[0-6]? cylinders?)'
keys = ['manufacturer', 'condition', 'fuel', 'title_status', 'transmission', 'drive','size', 'type', 'paint_color' , 'cylinders']
columns = [ manufacturer, condition, fuel, title_status, transmission ,drive, size, type_, paint_color, cylinders]
for i,column in zip(keys,columns):
database[i] = database[i].fillna(
database['description'].str.extract(column, flags=re.IGNORECASE, expand=False)).str.lower()
database.drop('description', axis=1, inplace= True) There are some minor differences such as the drive, type, and num_of_cylinders. However, almost all the variables are the same. If we hold all the variables constant, how can I write the same code in SAS to extract data from the description column to fill in the missing values in the manufacturer, condition, fuel, transmission, and so on columns? I think I have created the list correctly, however here is what I have done so far to format each column and its observations into a list. %let manufacturer = (gmc,hyundai,toyota,mitsubishi,ford,chevrolet,ram,buick,jeep,dodge,subaru,nissan,audi,rover,lexus,honda,chrysler,mini,pontiac,mercedes-benz,cadillac,bmw,kia,volvo,volkswagen,jaguar,acura,saturn,mazda,mercury,lincoln,infiniti,ferrari,fiat,tesla,land rover,harley-davidson,datsun,alfa-romeo,morgan,aston-martin,porche,hennessey); %let condition = (excellent,good,fair,like new,salvage,new); %let fuel = (gas,hybrid,diesel,electric); %let title_status = (clean,lien,rebuilt,salvage,missing,parts only); %let transmission = (automatic,manual); %let drive = (4wd,fwd,rwd); %let size = (mid-size,full-size,compact,sub-compact); %let type = (sedan,truck,SUV,mini-van,wagon,hatchback,coupe,pickup,convertible,van,bus,offroad); %let paint_color = (red,grey,blue,white,custom,silver,brown,black,purple,green,orange,yellow); %let cylinders = (s[1-9] cylinders?, s1[0-6]? cylinders?);
... View more