PREFACE: I am not an expert on any of the following, I'm merely sharing my ideas and questions, almost stream of conscious style, cuz sometimes when you ask wacky questions, you glean actionable intel. The following is a learn-as-I-go exercise and not definitive data or even perhaps 100% correct, it's simply a work-in-progress that seemed to make sense to release in the wild, all names removed.
The other day a friend hit me up with a link to a video.
That friend is someone I completely admire and he's one of the
smartest guys I've ever met. I had a ton of meetings that day, so I
couldn't immediately put on my Beats
and check it out. A few hours later, he IM's me: "Check out that video
yet?" "No, but I will." "Best presentation I've ever seen, I'm going
to buy every single one of his books!"
That was coming from someone who recommends at least one article or
video a week, so for him to come back like that, I knew I had to stop
what I was doing and make it a top priority. The video ended up being
one of the best I’ve seen as well. Right up there with my two long-time
favorites: "Lateral Movement" by Harlan Carvey and "Finding Unknown Malware" by Alissa Torres of "Malware can hide but it must run"
fame - the audio on "Finding..." has some issues, but I've watched it
so many times I've lost track, it's worth putting up with the less than
perfect audio. Alissa actually did another presentation that's similar:
"Detecting Persistence Mechanisms"
but I digress. So, after watching the video that my friend
recommended, I had a conversation with him. After our discussion, I
began to think about some things... If
organizations are only “watching” their netflow for the English
language, could they miss something? In other words, if the Chinese,
for example, have infiltrated your network, or are attempting to, they
may be writing code or binaries that are in Mandarin and using UTF-16
encoded in 16-bits, which would be 2 bytes and currently not easily or
(out of the box) detectable by most sensors.
So then I started to
think about all the hundreds of malware samples I’ve looked at in the
past year-and-a-half, and I can count on one hand the number of them
that had a Chinese signature.
I've also seen artifacts of chats
from unwanted guests already on networks, in English. So would it also
make sense to hunt for very specific Chinese language characters or
strings of characters?
Not having all of the answers, and again
not being an authority on any of this, I “phoned a friend” and ended up
sitting down with two of my favorite Mandarin character experts, which
of course led to even more questions :( ...
Speaking only about binaries (not isolated strings or chats), if the
binaries are undetected wouldn't they eventually still need (in the end)
to convert to Assembly to run, and if so, you'd see them then?
(2) Based on (1) above, should one perhaps just be filtering on binary headers and looking at just the signatures?
(3) Would another approach be to search the binary source code for Chinese language characters?
I learned was that the language of the binary is “usually” defined by
the resource section. You have the locale ID and/or language identifier
which tells you the language. For example Locale 0x0409 is English,
0x0X04 is Chinese (as well 0x0004, 0x07C04, 0x0404, 0x0804). Or, for
example, Lang ID 0x09 is English, 0x0A is Spanish etc. For YARA it
would be something like pe.language(0x09) for English.
Other codes: https://msdn.microsoft.com/en-us/library/windows/desktop/dd318693(v=vs.85).aspx
challenge could be if you have employees who are Chinese, or offices in
China, unless your searches are very specific, they could result in
multiple false positives. And of course my inquiry isn't really just
about China, that's merely one example. From there you could expand
your character searches to Arabic, French, Korean, Portuguese, Russian
Yet another one of my trusted contacts with whom I often
bounce things off of had previously advised me that using a language
scheme as an IOC is not going to generate meaningful data, period. So
next I sat down with one more person to discuss all of the above, and
quite frankly for a sanity check. My takeaways from that meeting were
(a) I wasn't crazy, and (b) there's one more possible angle and it's
regional based. For example, malware written in VB may be seen as
elementary, and frowned upon by a high caliber of threat actor such as
Russian, and that generally the more difficult programming languages are
more respected among those circles. That doesn't mean that malware
written in VB isn't from Russia, for example, but maybe it could help
narrow your initial search.
Lastly, a little bird told me that if you're going to find any of the above proactively, before the headlines hit, your answer may lie in hunting for behavioral anomalies, machine learning,,,and a whole heck of a lot of luck! Because, I was reminded, we have to be lucky all the time, they only have to be lucky once.