Hey guys, I tried to analyse vs4 file and i have found some interesting stuff. Unfortunately, none of this is a 'breakthrough' we had hoped for.
I checked three recent opponents with ids 4351, 4363 and 4370.
Video obviously takes almost all of the file, but there is more stuff after it ends.
The last fragment of the AVI file structure is the Index Entries (HEX
69 64 78 31
- idx1) followed by the list of data chunks and their location in the file (you can read more about it here: https:// learn.microsoft.com/en-us/windows/win32/directshow/avi-riff-file-reference#avi-index-entries).
The video section is followed by the main image of the opponent (the one you see when you pick opponent to play against or in the shop). This is simple JPG and is contained between bytes
FF D8
and
FF D9
(including). If you extract this to the different file you can open it in image editor.
Following image there is mp3 section with 'strip music' XORed with specific value. This value differs per opponent. The beginning of this section has a lot of byte padding which should equal to 00 in mp3 but is instead filled with random XOR value. Therefore to get correct bytes you need to run XOR operation with that value (for example for 4351 it is
10
, for 4363 it is
B1
and for 4370 it is
D7
). The mp3 section seems to end with LAME3 text followed by padding made of
AA
bytes. The last
AA
byte here indicates the end of audio section. You can compare extracted audio bytes with downloaded file from the vsp website (remove spaces from https:// us2.torquemada-games.com/pobieralnia www/online /XXXX/XXXX.mp3 where XXXX is opponent id).
If you remove all bytes mentioned before (video, image, audio) we are left with only ~30kb of gibberish data that I could not deduce what is there for. At this point do XOR again because it seems like only mp3 section was treated with it in the first place. The only thing I did notice here is the last <1000 bytes seemed more familiar across those 3 files than rest of the section (many printable characters with some repetition..). Maybe someone who is more proficient at data mining could get a sense of it. I think these last ~30kb could store more metadata, including information about frames and gestures.
I know this is not much but maybe it will inspire others to dig deeper
