Masking log output in Log4j2

Donations Make us online

Users are demanding more and more privacy and as a web developer, you have to make sure that sensitive information does not get leaked as best as you can. Most of you have good security measures set in place that protect from most common attacks, however, one part of the application that is usually neglected are the logs. This is because logs are usually meant to remain internal. This is not always the case and even if it was, if you are working in a big company with many developers, not all should have access to sensitive data.

After this article I made an open-source library that does this (and a lot more) efficiently and easy. It can be integrated with ease in your library. Please look at the dedicated page for LogMasker – Log4j and Logback Masking Library. If you just want to know how to do it, read further!

There is a need to log information in your application and it is important that you do it. It will save you precious time when investigating incidents or trying to solve bugs. Still, in many instances, sensitive data like the user’s email or password may be present in the logs, so it is critical that this information gets masked prior to printing it. In this article, I will show how to easily and efficiently apply masking algorithms to all logs using Log4j.

I believe that this method is efficient and easy to use, and most importantly, it guarantees that all logs are processed prior to them being written. If you are wondering why would anyone log passwords, just imagine that a complex system has a logging filter applied that logs all incoming request data, including form submissions. This is quite common in the industry, especially when multiple micro-services interact with one another.

A Log4j 2 Masking Converter

In a previous article, I showed how you can build a custom log pattern converter for log4j. We will be applying a similar approach here by creating a MaskingConverter class that also extends LogEventPatternConverter. This will guarantee that all logs are processed as long as the Log4j configuration file is correct.

But first, we must create a masker. Since there will be multiple masking patterns, let’s make a LogMasker interface that has one method.

public interface LogMasker {
    void mask(StringBuffer stringBuffer, String maskChar);
}

Now, we can create our masking converter. It will accept multiple input parameters during creation (or options, as they are called in log4j) which will dictate what maskers should be used. When a message is received, the maskers are applied one by one and only at the end the message is displayed.

@Plugin(name = "logmasker", category = PatternConverter.CATEGORY)
@ConverterKeys({"msk", "mask"})
public class MaskingConverter extends LogEventPatternConverter {
    private static final Map<String, LogMasker> OPTIONS_TO_MASKER = Map.of("email", new EmailMasker(),
            "pass", new PasswordMasker());

    private static final List<LogMasker> ALL_MASKERS = OPTIONS_TO_MASKER.entrySet().stream().map(Map.Entry::getValue).collect(Collectors.toList());

    private List<LogMasker> maskers;

    protected MaskingConverter(List<LogMasker> maskers) {
        super("LogMaskerConverter", null);
        this.maskers = maskers;
    }

    public static MaskingConverter newInstance(String[] options) throws Exception {
        if (options == null || options.length == 0) {
            return new MaskingConverter(ALL_MASKERS);
        }

        List<LogMasker> maskers = new ArrayList<>();
        for (String option:options) {
            LogMasker masker = OPTIONS_TO_MASKER.get(option);
            if  (masker == null) {
                throw new Exception("Invalid option detected for masker: " + option);
            }
            maskers.add(masker);
        }

        return new MaskingConverter(maskers);
    }

    @Override
    public void format(LogEvent event, StringBuilder toAppendTo) {
        StringBuffer logMessage = new StringBuffer(event.getMessage().getFormattedMessage());
        for (LogMasker masker:maskers) {
            masker.mask(logMessage, "*");
        }
        toAppendTo.append(logMessage);
    }
}

So, let’s look at the code a bit. First, we define a map where the key is the option value that corresponds to a masking algorithm. We also have a List of maskers that holds all the algorithms defined.

In the newInstance() method, we determine what algorithms to use based on the options. If an invalid option is found, we throw an exception. If everything is OK, we create the instance of our MaskingConverter.

In the format() method, we create a StringBuffer from our message and send it to each masker to do its magic. The StringBuffer is altered during processing, so no need to return anything and it is faster than simply using a String. Once everything is done, append the masked message to the logged StringBuilder (this is what will be displayed in the console).

We also need to define this as a Log4j plugin using the @Plugin annotation and specify what converter keys are to be used using the @ConverterKeys annotation. I won’t go into more details since you can find them in the original article on how to make a custom message converter.

Masking emails in Log4j2

Now comes the masking implementations. First, the email one. We will be using regular expressions to find emails in the log message and also for masking them. We still need SOME information, so we won’t be masking the entire email address. The first and last letters of the address and domain name will remain (since it makes it easier to analyze the log) as well as full domain. As an example, the email test.email@domain.com will be shown as t********l@d****n.com. Here I would like to give special thanks to Wiktor Stribiżew for his excellent answer on how to mask the email. You can find more masking patterns in his answer on StackOverflow.

public class EmailMasker implements LogMasker {
    private final Pattern emailFindPattern = Pattern.compile("([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+\.[a-zA-Z0-9_-]+)");
    private final Pattern emailMaskPattern = Pattern.compile("(?<=.)[^@](?=[^@]*?[^@]@)|(?:(?<=@.)|(?!^)\G(?=[^@]*$)).(?=.*[^@]\.)");

    public void mask(StringBuffer stringBuffer, String maskChar) {
        Matcher matcher =  emailFindPattern.matcher(stringBuffer);
        if (matcher.find()) {
            String email = matcher.group(1);
            String masked = RegExUtils.replaceAll(email, emailMaskPattern, maskChar);
            int idx = stringBuffer.indexOf(email);
            stringBuffer.replace(idx, idx + email.length(), masked);
        }
    }
}

Masking passwords in Log4j2

Using a similar approach, we need to mask any passwords. The code is slightly different here for two reasons. First, we have multiple keywords we need to search for (password or pwd for example). Secondly, we need to always mask with the exact same string so that we don’t reveal how many characters the password has. Still, the fundamentals are the same.

public class PasswordMasker implements LogMasker {
    private final Pattern passwordFindPattern = Pattern.compile("(?i)((?:password|pwd)(?::|=)(?:\s*)[A-Za-z0-9@$!%*?&.;<>]+)");

    @Override
    public void mask(StringBuffer stringBuffer, String maskChar) {
        Matcher matcher =  passwordFindPattern.matcher(stringBuffer);
        if (matcher.find()) {
            String password = matcher.group(1);
            int idx = stringBuffer.indexOf(password);
            stringBuffer.replace(idx, idx + password.length(), "password: ******");
        }
    }
}

Trying everything out

Now, let’s try our code. We make a configuration file for log4j and for the message we use %msk instead of %m.

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN" packages="com.ppopescu.logging">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level - %msk{email}{pass}%n"/>
        </Console>
    </Appenders>
    <Loggers>
        <Root level="info">
            <AppenderRef ref="Console"/>
        </Root>
    </Loggers>
</Configuration>

Next, let’s write a main() function where we log a few strings:

public class Main {
    public static void main(String[] argv) {
        Logger logger = LogManager.getLogger(Main.class);
        logger.info("This will mask the email test.email@domain.com");
        logger.info("This is a password: testpass");
        logger.info("Another password:testpass2. That will be masked");
        logger.info("Nothing will be masked here");
    }
}

As you can see, we have an email address there as well as passwords. If we run the code now, we get it nicely masked.

20:44:56.680 [main] INFO  - This will mask the email t********l@d****n.com
20:44:56.688 [main] INFO  - This is a password: ******
20:44:56.689 [main] INFO  - Another password: ****** That will be masked
20:44:56.691 [main] INFO  - Nothing will be masked here

Other Considerations

First, keep in mind that this will add more processing for the log. The more LogMaskers you have, the more it will take to process each logged line. I tried to make things quite fast by using as few objects as possible and by using mutable objects wherever possible. Also, I used StringUtils and RegExUtils from Apache Commons Lang, since it is faster than the standard Java implementation.

Secondly, other improvements may be made. Someone that knows RegEx really well may create just ONE masker that does this in one single sweep of the logged line. Sadly I am not that good with regular expressions. If you do have such an implementation and are willing to share it, please contact me and I will gladly share it here.

Lastly, keep in mind that these are only examples. Maybe the regular expressions used could be improved and more maskers can be added based on your needs. As an example, an IP masker may be useful for some. Feel free to add one.

And as always, you can download the full code here: Masking logs in Log4j 2 Source Code


Source link