chapter 6: two mistakes in Example 6.1

There are two mistakes in Example 6.1
- a typo: it refers to `d` instead of `d2` for the mean imputation
- an unnecessary `lambda df2` that can be removed

Correct code, I believe:

```
# version of the guns polls with some errors
url = "https://cssbook.net/d/guns-polls-dirty.csv"
d2 = pd.read_csv(url)

# Option 1: clean with direct assignment
# Note that when creating a new column,
# you have to use df["col"] rather than df.col
d2["rep2"] = d2.rep.str.replace("[^0-9\\.]", "")
d2["rep2"] = pd.to_numeric(d2.rep2)
d2["Support2"] = d2.Support.fillna(d2.Support.mean())

# Alternatively, clean with .assign
# Note the need to use an anonymous function
# (lambda) to chain calculations
cleaned = d2.assign(
    rep2=d2.rep.str.replace("[^0-9\\.]", ""),
    rep3=pd.to_numeric(d2.rep2),
    Support2=d2.Support.fillna(d2.Support.mean()),
)

# Finally, you can create your own function
def clean_num(x):
    x = re.sub("[^0-9\\.]", "", x)
    return int(x)

cleaned["rep3"] = cleaned.rep.apply(clean_num)
cleaned.head()```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chapter 6: two mistakes in Example 6.1 #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

chapter 6: two mistakes in Example 6.1 #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions