-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
There are two mistakes in Example 6.1
- a typo: it refers to
dinstead ofd2for the mean imputation - an unnecessary
lambda df2that can be removed
Correct code, I believe:
# version of the guns polls with some errors
url = "https://cssbook.net/d/guns-polls-dirty.csv"
d2 = pd.read_csv(url)
# Option 1: clean with direct assignment
# Note that when creating a new column,
# you have to use df["col"] rather than df.col
d2["rep2"] = d2.rep.str.replace("[^0-9\\.]", "")
d2["rep2"] = pd.to_numeric(d2.rep2)
d2["Support2"] = d2.Support.fillna(d2.Support.mean())
# Alternatively, clean with .assign
# Note the need to use an anonymous function
# (lambda) to chain calculations
cleaned = d2.assign(
rep2=d2.rep.str.replace("[^0-9\\.]", ""),
rep3=pd.to_numeric(d2.rep2),
Support2=d2.Support.fillna(d2.Support.mean()),
)
# Finally, you can create your own function
def clean_num(x):
x = re.sub("[^0-9\\.]", "", x)
return int(x)
cleaned["rep3"] = cleaned.rep.apply(clean_num)
cleaned.head()```
Metadata
Metadata
Assignees
Labels
No labels