Data cleansing¶
When working with data, you seldom get data that you can directly work with.
Oftentimes your data are kind of messy, e.g. there might be missing data, outliers
etc.
With respect to credit ratings, rating agencies often attach rating outlooks or
rating watches. They should indicate in what direction the rating agency will
probably change the rating going forward. When an outlook has been assigned to a
rating, it might look something like AA- *+, i.e. the outlook follows the star sign.
These "attachments" do create some harm. Consider a BBB- rating with a negative
outlook. This means that the rating agency might lower the rating in the
foreseeable future. What rating score should such a rating get assigned? Usually, a
BBB- rating is equivalent to a rating score of 10 (see
Long-term ratings).
Should we assign a rating score of 11 just because of the watch? — Probably not!
Firstly, the rating hasn't been lowered as of today, and, secondly, a lower rating in
the future is not certain at all.
As a matter of fact, when considering the current status quo, most of the time the best idea is to ignore credit outlooks and credit watches altogether. That is, clean your data!
There is at least one other fact that makes cleansing necessary: Unsolicited ratings
.
An unsolicited rating is usually designated by the letter "u", which is directly
attached to the actual rating, e.g. AA-u. To translate the rating into a score and
being able to use it properly in any kind of computation, you better get rid of
this letter.
For all these cases, pyratings offers a function called
get_pure_ratings
. Its
sole purpose is to clean ratings, i.e. remove watches/outlooks and the letter "u".
Before starting, let's import some libraries.
import pandas as pd
import numpy as np
import pyratings as rtg
Cleaning single ratings¶
unsolicited_rating = "BBB+u"
rtg.get_pure_ratings(ratings=unsolicited_rating)
'BBB+'
rating_with_outlook = "AA *-"
rtg.get_pure_ratings(ratings=rating_with_outlook)
'AA'
Cleaning a pd.DataFrame
¶
It's also possible to pass a pd.DataFrame
and have all cells get cleaned at once.
Also, note that the column headers will be suffixed ("_clean").
rtg_df = pd.DataFrame(
data={
"rtg_SP": [
"BB+ *-",
"BBB *+",
np.nan,
"AA- (Developing)",
np.nan,
"CCC+ (CwPositive)",
"BB+u",
],
"rtg_Fitch": [
"BB+ *-",
"BBB *+",
pd.NA,
"AA- (Developing)",
np.nan,
"CCC+ (CwPositive)",
"BB+u",
],
},
)
rtg_df
rtg_SP | rtg_Fitch | |
---|---|---|
0 | BB+ *- | BB+ *- |
1 | BBB *+ | BBB *+ |
2 | NaN | <NA> |
3 | AA- (Developing) | AA- (Developing) |
4 | NaN | NaN |
5 | CCC+ (CwPositive) | CCC+ (CwPositive) |
6 | BB+u | BB+u |
rtg.get_pure_ratings(ratings=rtg_df)
rtg_SP_clean | rtg_Fitch_clean | |
---|---|---|
0 | BB+ | BB+ |
1 | BBB | BBB |
2 | NaN | <NA> |
3 | AA- | AA- |
4 | NaN | NaN |
5 | CCC+ | CCC+ |
6 | BB+ | BB+ |