Skip to content

chapter 6: two mistakes in Example 6.1 #8

@damian0604

Description

@damian0604

There are two mistakes in Example 6.1

  • a typo: it refers to d instead of d2 for the mean imputation
  • an unnecessary lambda df2 that can be removed

Correct code, I believe:

# version of the guns polls with some errors
url = "https://cssbook.net/d/guns-polls-dirty.csv"
d2 = pd.read_csv(url)

# Option 1: clean with direct assignment
# Note that when creating a new column,
# you have to use df["col"] rather than df.col
d2["rep2"] = d2.rep.str.replace("[^0-9\\.]", "")
d2["rep2"] = pd.to_numeric(d2.rep2)
d2["Support2"] = d2.Support.fillna(d2.Support.mean())

# Alternatively, clean with .assign
# Note the need to use an anonymous function
# (lambda) to chain calculations
cleaned = d2.assign(
    rep2=d2.rep.str.replace("[^0-9\\.]", ""),
    rep3=pd.to_numeric(d2.rep2),
    Support2=d2.Support.fillna(d2.Support.mean()),
)

# Finally, you can create your own function
def clean_num(x):
    x = re.sub("[^0-9\\.]", "", x)
    return int(x)

cleaned["rep3"] = cleaned.rep.apply(clean_num)
cleaned.head()```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions