JSON and Common Web Encodings Demystified

Across the internet, web applications, browsers, and mobile applications send huge amounts of complex data. Some of this content is easy for us humans to read as it is written in clear text with no obfuscation, encoding, or encryption. But sending this “human-readable” text across the internet from mobile apps on our devices is inefficient. Additionally, some of the web queries we make on web sites would break the web applications we send them to if the text we sent was not encoded.

How is this OSINT-related?

As OSINTers, we need to recognize these different formats of encoded and transformed data so that we can extract meaningful OSINT data from it. I think you’ll see that, once you learn a little bit about the formats of this data, it isn’t intimidating at all.

If you’ve looked at other content on our blog and in our videos, you may have already seen content referencing JSON-formatted, Base64, and URL-encoded data. We wanted to bring some additional focus to these topics in this blog and explain them in a bit more detail.

Conversion Resource

I’m going to use the free, online (and offline), JavaScript-based web application CyberChef (https://gchq.github.io/CyberChef/) for all the conversions and transformations of data in this blog. It is a simple, fast, easy-to-use site that is extremely capable.

The idea with CyberChef is that you have operations (arrow 1 below) that can do things to some data you paste into the browser (arrow 3 below). You can make recipes (arrow 2) that does something to the input data to create the output data (arrow 4). Below, I’m using CyberChef to extract email addresses from text using the “Extract email addresses” operation as an example.

CyberChef Application Extracting Emails from Text

This simple example just hints at the power of this free, tool. Read on to see how we can use it to extract, transform, and decode other formats of data!

Encoding

Base64 Encoding

When we send some data across the internet, we need to transform it into characters that are compatible with web traffic. Turns out the characters A-Z, a-z, 0-9, and + and / are great substitutions. If you count them up (A-Z = 26, a-z = 26, 0-9 = 10, and + and /) you have 64 different characters to use. That is the 64 in Base64.

We are going to skip how Base64 conversions happen (but you can read about it at https://en.wikipedia.org/wiki/Base64) and focus on what Base64 strings look like and how to convert to and from it.

Recognizing Base64 Content

We mentioned above that Base64 encoded strings can have any of the characters A-Z, a-z, 0-9, and + and /. They might look like the string below:

SWYgeW91IGFyZSByZWFkaW5nIHRoaXMsIG15IGZyaWVuZCBKZW5ueSBjYW4gYmUgZm91bmQgYXQgdGhlIHBob25lIG51bWJlciArMSAyMTIgODY3IDUzMDkuIDopIA==

Notice the characters used? Notice the length? Notice the two equals signs (==) on the end? All of those things should signal to you that this might be Base64 encoded. Although Base64 encoded content does not HAVE to have equal signs on the end of it, it commonly will have either one or two due to padding issues when the original input string is encoded.

Decoding Base64 Content

Unlike encryption, we can easily reverse something that is encoded without a password or key. This decoding process is straightforward. In CyberChef (https://gchq.github.io/CyberChef/), do the following:

  1. Remove any existing recipes by pressing the trash can icon (arrow 1)
  2. In the “Search…” field in the Operations section on the left of the window, type: from base64 (arrow 2)
  3. Drag the “From Base64” result in the Operations section into the Recipe section and let it drop (arrow 3)
  4. Paste the above Base64 encoded text into the Input pane (arrow 4)
  5. Your result will show in the Output section (arrow 5). [Note: We blurred the content in the Output section below so that you actually go and convert the above text to find out what it says. Sneaky, right?]
Using CyberChef to Base64 Decode a String

You can, of course, do the reverse of what we did and Base 64 encode clear text by choosing the operation “To Base64” in CyberChef instead of the “From Base64”.

URL or Percent-encoding

Recognizing URL/Percent-encoded Content

There is another encoding scheme that you have probably seen a bunch: URL or percent-encoding. This technique is used to encode characters in a URL that would cause the URL to be invalid if they were not encoded. Here we are talking about characters like spaces, slashes, colons, and others. Below is an example URL that has several characters percent encoded.

https://web.archive.org/web/https%3A%2F%2Fosintcurio.us%2F

URL/percent-encoding (https://en.wikipedia.org/wiki/Percent-encoding) takes specific characters and converts them into a “%XX” format where the “XX” is a combination of a letter and number or 2 numbers. The letters can be upper or lower case.

Decoding URL/Percent-Encoded Content

You may be able to look at that example above and guess that the %3A is probably a “:” and the 2 %2F groupings are “//” because you noticed there is an “https” right in front of them and then a domain right after. You would be correct! Let’s go to CyberChef and see how to decode (and encode) percent-encoded content.

  1. Remove any existing recipes by pressing the trash can icon (arrow 1).
  2. In the “Search…” field in the Operations section on the left of the window, type: url decode (arrow 2)
  3. Drag the “URL Decode” result in the Operations section into the Recipe section and let it drop (arrow 3)
  4. Paste the above URL encoded text into the Input pane (arrow 4)
  5. Your result will show in the Output section (arrow 5)
CyberChef URL Decoding a URL

JSON (JavaScript Object Notation)

The applications on our mobile devices and the advanced web sites we visit in our browsers constantly send and receive data in efficient formats that they can easily process or parse. JSON (JavaScript Object Notation) is one of the most popular formats used today.

Recognizing JSON Content

JSON data can be found in the source code of web pages, in XHR transmissions from your browsers (see this OSINT Curious 10 Minute Tip video for details on XHR and JSON), and in the traffic sent and received from our mobile device apps.

JSON content is easy to recognize as it uses curly braces ({ }), square braces ([ ]), quotation marks (“), and colons (:) to define the data. Keep in mind that JSON data may be URL encoded and/or Base64 encoded too! Below is an excerpt of some JSON from the http://en.gravatar.com/webbreacher.json page:

{"entry": [{"id": "59074232","hash": "f2d706e909b07b4c80f4c2842e06519d","requestHash": "webbreacher","profileUrl": "http://gravatar.com/webbreacher","preferredUsername": "webbreacher","thumbnailUrl": "http://2.gravatar.com/avatar/f2d706e909b07b4c80f4c2842e06519d","photos": [{"value": "http://2.gravatar.com/avatar/f2d706e909b07b4c80f4c2842e06519d","type": "thumbnail"}], ... (this is not a complete JSON document)

First thing you recognize is that this doesn’t look like a nice, human-readable web page. It isn’t supposed to! It is an easily-parsable method of data transmission for computer programs. You can see the first curly brace ({) that starts the JSON object. Then we have the “entry” parameter with the quotes around it. JSON usually is in a “parameter”: “value” format. In the above example, you see the parameter “id” has a value of “59074232”.

JSON in Mozilla versus Chrome

While the above JSON example doesn’t look too intimidating, it sure isn’t easy to read, is it? We are viewing the raw format of JSON above. If you have Google Chrome, this is your default view. Mozilla Firefox has a built-in JSON decoder and should parse the data so it is easier for us humans to read (see below). Arrow 1 shows the “human readable” format” and arrow 2 below points to the raw data format in case you don’t want the content “prettified” for us humans. There are Extensions you can add to Chrome to prettify the JSON (for example: JSON Viewer).

Mozilla Firefox Displaying JSON Content

Decoding JSON

Turns out that we already know a tool that can decode JSON… CyberChef!

  1. Remove any existing recipes by pressing the trash can icon (arrow 1).
  2. In the “Search…” field in the Operations section on the left of the window, type: json (arrow 2)
  3. Drag the “JSON Beautify” result in the Operations section into the Recipe section and let it drop (arrow 3)
  4. Paste the JSON text from http://en.gravatar.com/webbreacher.json into the Input pane (arrow 4)
  5. Your result will show in the Output section (arrow 5)

Bonus… Unix Time Conversion!

Another wonderful conversion CyberChef performs is moving to and from Unix or epoch time. If you ever find an integer that looks like this: 1565806706, it may be the number of seconds that has elapsed since 00:00:00 UTC on January 1, 1970. This is called the Unix Epoch (also known as unix time, epoch time, and other names).

Conversion in CyberChef (and a huge variety of other tools) is simple.

  1. Choose the “From UNIX timestamp” operation on the left and move it to the Recipe pane as you did above
  2. Insert your time stamp number in the Input field
  3. Get your date and time in the Output field
CyberChef Converting a Unix Timestamp

That’s It?

Of course not. There are a huge number of other things CyberChef and tools like it can do with the data you find. This should give you a head start to begin examining how they can help you go farther and deeper in your assessments.

2 thoughts on “JSON and Common Web Encodings Demystified

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.