Data cleansing¶

When working with data, you seldom get data that you can directly work with. Oftentimes your data are kind of messy, e.g. there might be missing data, outliers etc.
With respect to credit ratings, rating agencies often attach rating outlooks or rating watches. They should indicate in what direction the rating agency will probably change the rating going forward. When an outlook has been assigned to a rating, it might look something like AA- *+, i.e. the outlook follows the star sign.

These "attachments" do create some harm. Consider a BBB- rating with a negative outlook. This means that the rating agency might lower the rating in the foreseeable future. What rating score should such a rating get assigned? Usually, a BBB- rating is equivalent to a rating score of 10 (see Long-term ratings).
Should we assign a rating score of 11 just because of the watch? — Probably not!
Firstly, the rating hasn't been lowered as of today, and, secondly, a lower rating in the future is not certain at all.

As a matter of fact, when considering the current status quo, most of the time the best idea is to ignore credit outlooks and credit watches altogether. That is, clean your data!

There is at least one other fact that makes cleansing necessary: Unsolicited ratings .
An unsolicited rating is usually designated by the letter "u", which is directly attached to the actual rating, e.g. AA-u. To translate the rating into a score and being able to use it properly in any kind of computation, you better get rid of this letter.

For all these cases, pyratings offers a function called get_pure_ratings. Its sole purpose is to clean ratings, i.e. remove watches/outlooks and the letter "u".

Before starting, let's import some libraries.

In [1]:

Copied!

import pandas as pd
import numpy as np

import pyratings as rtg
import pandas as pd
import numpy as np

import pyratings as rtg

Cleaning single ratings¶

In [2]:

Copied!

unsolicited_rating = "BBB+u"
rtg.get_pure_ratings(ratings=unsolicited_rating)
unsolicited_rating = "BBB+u"
rtg.get_pure_ratings(ratings=unsolicited_rating)

Out[2]:

'BBB+'

In [3]:

Copied!

rating_with_outlook = "AA *-"
rtg.get_pure_ratings(ratings=rating_with_outlook)
rating_with_outlook = "AA *-"
rtg.get_pure_ratings(ratings=rating_with_outlook)

Out[3]:

'AA'

Cleaning a `pd.DataFrame`¶

It's also possible to pass a pd.DataFrame and have all cells get cleaned at once. Also, note that the column headers will be suffixed ("_clean").

In [4]:

Copied!





rtg_df = pd.DataFrame(
    data={
        "rtg_SP": [
            "BB+ *-",
            "BBB *+",
            np.nan,
            "AA- (Developing)",
            np.nan,
            "CCC+ (CwPositive)",
            "BB+u",
        ],
        "rtg_Fitch": [
            "BB+ *-",
            "BBB *+",
            pd.NA,
            "AA- (Developing)",
            np.nan,
            "CCC+ (CwPositive)",
            "BB+u",
        ],
    },
)

rtg_df
rtg_df = pd.DataFrame(
    data={
        "rtg_SP": [
            "BB+ *-",
            "BBB *+",
            np.nan,
            "AA- (Developing)",
            np.nan,
            "CCC+ (CwPositive)",
            "BB+u",
        ],
        "rtg_Fitch": [
            "BB+ *-",
            "BBB *+",
            pd.NA,
            "AA- (Developing)",
            np.nan,
            "CCC+ (CwPositive)",
            "BB+u",
        ],
    },
)

rtg_df

Out[4]:

	rtg_SP	rtg_Fitch
0	BB+ *-	BB+ *-
1	BBB *+	BBB *+
2	NaN	<NA>
3	AA- (Developing)	AA- (Developing)
4	NaN	NaN
5	CCC+ (CwPositive)	CCC+ (CwPositive)
6	BB+u	BB+u

In [5]:

Copied!

rtg.get_pure_ratings(ratings=rtg_df)
rtg.get_pure_ratings(ratings=rtg_df)

Out[5]:

	rtg_SP_clean	rtg_Fitch_clean
0	BB+	BB+
1	BBB	BBB
2	NaN	<NA>
3	AA-	AA-
4	NaN	NaN
5	CCC+	CCC+
6	BB+	BB+

Data cleansing¶

Cleaning single ratings¶

Cleaning a pd.DataFrame¶

Cleaning a `pd.DataFrame`¶