An introduction to why your game is slow - Part 1

EverCursed · May 15, 2019

Part 2 is now up: https://f95zone.to/threads/why-your-game-is-slow-part-2.30341/

This will be a fairly long technical post that requires some programming knowledge. For most of my examples I will be using C, and I may add applications of my points in some other languages (Python, Java, etc). However, no matter what language the examples are in, the theory applies to absolutely any language without exception. I use C because it gives more control over how your program behaves compared to other languages like Python.

The reason for me writing this is as follows. I recently played a game from this site on my phone (I won't point out which). Any time video ran, it played at around 0.5-2fps. So I thought I could write up a little guide to some possible performance problems in games. If you don't care about that, it is still nice to know no matter what you do.

Introduction

In my last semester of university, I had an assignment to write a web crawler that also brute forces passwords. This was also a group project. I started the project, outlined the structure of the program, and shared it with my teammates.

The next thing that happened was that my teammates revised my code. They made two major changes: 1. They made the code object-oriented, and 2. They changed the link storage system from an array to a linked list.

At this point I realized that students are not actually taught computer architecture in depth. Therefore, I thought of doing a small series about programming suited to those who have already done some.

Computer Architecture

Before anything else, it is important to understand how computers work. I expect that pretty much everyone understands the basic structure of the computer. It can be summarized like so:

The CPU is the computer's brain.
RAM stores data and is very quick to access.
Hard drives (HDDs) or solid state drives (SSDs) are for permanent storage. They are slower than RAM but hold data while the computer is turned off.

It is now my pleasure to let you know that this model is outdated. This is how computers worked about 30 years ago.

The current computer model looks something like this:

The CPU is the computer's brain.
RAM stores data and is very slow to access.
Hard drives (HDDs) or solid state drives (SSDs) are for permanent storage. They are lethally slow but hold data while the computer is turned off.
CPUs have hierarchies of caches that store recently used data. They are faster than RAM and much smaller than it.

The main reason for this change is that computers have started to approach the limits of physics. Look at the end if you are interested in the reason for this.

Caches

Let's start with an example:

C:

int array[1024][1024];
for(int outer = 0; outer < 1024; outer++)
{
    for(int inner = 0; inner < 1024; inner++)
    {
        array[inner][outer] = 0;
    }
}

Those who have done structured programming will likely see the issue with this code.

If we visualize how the array is laid out in memory, it looks something like this:

Code:

 address        index
----------------------------
0x00000000    array[0][0]
0x00000004    array[0][1]
0x00000008    array[0][2]
0x0000000C    array[0][3]
0x00000010    array[0][4]
0x00000014    array[0][5]
0x00000018    array[0][6]
0x0000001C    array[0][7]
0x00000020    array[0][8]

   ....           ....

0x00000FF8    array[0][1022]
0x00000FFC    array[0][1023]
0x00001000    array[1][0]
0x00001004    array[1][1]
0x00001008    array[1][1]

   ....           ....

0x003FFFF0    array[1023][1020]
0x003FFFF4    array[1023][1021]
0x003FFFF8    array[1023][1022]
0x003FFFFC    array[1023][1023]

Our loop first loops through the outer dimension and then through the inner dimension. It starts at array[0][0], next goes to array[1][0], array[2][0]... Or, to put it another way, it visits addresses 0x00000000, 0x00001000, 0x00002000, 0x00003000...

So why is this slow? Wasn't the point of an array that you could randomly access different locations?

Well yes, but there are two major problems.

The first problem is that accessing memory is far slower than you might think. As the speed of the CPU increased, memory has failed to catch up. See at the end for the simple reason for why this is. For now all we care about is whether it is slow enough to matter. Here is a table for very rough latencies of different memory access operations on a modern computer.

Operation	Latency
L1 cache reference	0.5 ns
L2 cache reference	7 ns
Main memory reference	100 ns

Partially stolen from

You must be registered to see the links

.

To put this in perspective, think of your average AAA game. Runs at 60fps. This means it has 16.666ms to do physics, rendering, loading assets, audio queuing, input processing. You cannot waste 100ns to read a single value.

The second problem is that memory is retrieved in chunks of a specific size. The exact size depends on the architecture, but most common one is 64 bytes as of now. You do not have a choice in this. If you reference a single byte in memory, you will fetch 64 bytes. What will also happen is that the entire chunk (also called cache line) will be placed in the CPU cache. This is done so that you don't need to waste time retrieving 64 bytes again when you reference that same byte or a byte close to it. That being said, it isn't actually that simple. More on this later.

Going back to the example now. Let's look how it performs, versus how it would perform if we flipped the order of our array accesses to array[outer][inner].

array[inner][outer] array[outer][inner]

Code:

  address        index             cache
-------------------------------------------------
0x00000000    array[0][0]    cache miss - 100ns
0x00001000    array[1][0]    cache miss - 100ns
0x00002000    array[2][0]    cache miss - 100ns
0x00003000    array[3][0]    cache miss - 100ns
0x00004000    array[4][0]    cache miss - 100ns
0x00005000    array[5][0]    cache miss - 100ns
0x00006000    array[6][0]    cache miss - 100ns
   ....          ....                ....

Code:

  address        index             cache
-------------------------------------------------
0x00000000    array[0][0]    cache miss - 100ns
0x00000004    array[0][1]    in cache - 0.5ns
0x00000008    array[0][2]    in cache - 0.5ns
0x0000000C    array[0][3]    in cache - 0.5ns
0x00000010    array[0][4]    in cache - 0.5ns
0x00000014    array[0][5]    in cache - 0.5ns
0x00000018    array[0][6]    in cache - 0.5ns
   ....          ....                ....

Now there is one more piece of information that we should know. That is the fact that the hardware is actually quite a bit smarter than we give it credit. It will detect memory access patterns and will speculatively prefetch cache lines before you ask for them. The important thing to note here is that memory accesses need to be regular: every 1 bytes, every 2 bytes, 4 bytes, 8 bytes, etc. If it is regular, the hardware will detect it and will start fetching cache lines you might need soon while the CPU is doing other operations.

Because of this our new array access pattern will likely not hit a cache miss after the first one. It will keep fetching more cache lines while we are going through each 64 bytes of data at a time. Our old version however can't benefit from this. That's because the memory accesses are so far apart, that even though our memory access pattern is regular, we don't do enough work to take up the 100ns needed for the next cache line to be fetched. Because of this we will hit a cache miss every time. To make things even worse, when we finish looping through one dimension once, and increment the other dimension, we basically return to the start of the array. While we have already fetched that cache line, it is possible that the cache line has already been flushed and will need to be fetched again.

How Does this Help?

Let's see how this applies to a few popular languages.

However, I am not in any way an expert in the following languages. Most of this is going to be from guesses, things I heard, random articles, etc. Please let me know if I make a mistake.

Python

One thing I will recommend is to avoid using standard python data structures. For example, lists. It may seem like they are similar to arrays. They can be accessed in order and are internally dynamic arrays. Sounds good so far.

What's important though is that they are dynamic arrays of pointers. Which means you will get the pointer likely without a cache miss, but will then have to dereference it leading to a... cache miss. The problem is that we don't know where it points to and if those memory locations are regular. In other words, there will likely be a problem with prefetching.

As far as I'm aware, NumPy arrays seem to deal with these problems.

C++

Not much to say here. The language is absolutely disfigured by the amount of unfortunate decisions the committee has made. A lot of useful structures are actually linked lists internally. See the end for a rant about why this is literally worse than AIDS (though I think you can guess by now).

Java

I thought about adding explanations about the sorts of issues there are with Java, but I realized that I have not talked about everything I need to explain these. Alas our journey into cache understanding has only just begun.

Why do we care?

For simple software, we don't. It's as simple as that. It is unlikely that your program will be complex enough for this to matter.

I can only say this. Have you ever thought about why your phone battery can't last more than a day? The problem is that we don't know how to make the CPU use less energy. We only know of one way to do that, and it's to turn it off. If you ever looked at how your CPU performs, you will see that the frequency of it varies depending on what it's doing. This is because when there isn't much to do, the CPU will just stop working, and continue some time later. However, when the CPU is waiting for memory, it does not sleep.

Addendum

Limitation of Physics

This is an interesting one. If you look at a motherboard and look at the distance between the RAM sticks and the CPU, I would assume the distance would be around 10cm / 4in. I recommend you go on Google and type this in: speed of light / 10cm. Look at the result. If this is smaller than your CPU frequency, congratulations, it takes more time for a signal to reach the memory from the CPU than it takes for a CPU cycle to complete. And this is only one way. It also doesn't include electrons dancing through the RAM stick gathering the cache line. And it also doesn't include the fact that electricity moves slower than the speed of light.

We don't know how to make faster memory. That's why we started sticking memory directly into the CPU, which is called cache.

Linked Lists

This is the number one cause of all performance problems. If you use a linked list, please remove it. While asymptotic complexity of traversing a linked list is the same as traversing an array, it does not consider that you will get a cache miss every time you move to the next node. Asymptotically it's the same, practically its 200 times slower.

It is the same as the worse version of our array traversal, except that it also has to allocate memory when it inserts. It allocates it on the heap, likely causing internal, and potentially external fragmentation.

The only time linked lists are acceptable is when your program inserts into the list many times, and then reads the entire list once, or somewhere around that order of magnitude. That's what it's designed for. Insertion is easy and compared to other data structures, and also fairly quick. Reading is painfully slow.

What else is there?

To be honest quite a lot. This was an insultingly small overview of caches. There is still a lot to cover, and that's just about caches. Here's a small list.

Instruction cache vs data cache
Cache hierarchy (L1, L2, L3 caches, how many of each)
Multi-threading and caches
Object oriented design and caches
Data structure alignment
What can the CPU do in one cycle?

Part 1 End

For this part I would recommend you to look through the documentation for the language you use. See how various data structures are implemented. See if the implementation fits your needs.

anne O'nymous · May 15, 2019

Sorry, but I'll be this guy...

All your arguments are mostly rights, but the conclusion you get from them come from another time that I'll summarize as : the last century.

EverCursed said:
I recently played a game from this site on my phone (I won't point out which). Any time video ran, it played at around 0.5-2fps.

You address a false problem. Whatever the engine used for the game, I'm 10000% sure that not a single game available here use a homemade code to play videos.
To this must be added that the effective fps of a video depend of many factors :

The fps defined when the video was encoded ;
The codec used to encode the said video ;
The availability of hardware acceleration for this codec ;
The amount of memory available to store the video ;
The size of the cache of the mass storage device ;
The speed of the mass storage device ;
The system's load ;
The efficiency of the player.

Accusing 1 of these 8 factors don't feel this right.

EverCursed said:
To put this in perspective, think of your average AAA game. Runs at 60fps. This means it has 16.666ms to do physics, rendering, loading assets, audio queuing, input processing. You cannot waste 100ns to read a single value.

Sorry, but no, no and no. Not a single AAA game performing at 60 fps have 16.666ms to perform all it need to do. Simply because there isn't a single nowadays AAA game that run on a mono-task OS. Whatever the OS, they all are multi-task, which imply that the game will never have 1/60 of second to works, but only 1/60 of the time that the OS will allow to it. And globally this time will never be higher that 8/10 of second. Still, those games achieve to perform at more than 60fps while using an engine mostly wrote in C++/C#, and for many of them the code of the game itself is wrote in an interpreted pseudo-language.
Only critical code is effectively optimized nowadays, wrote in C for the most part, and falling to Assembler for anything addressing directly the display. Everything else is wrote in an Object Oriented Language and it's enough because nowadays OOL operations are mostly atomic.
To this must be added the fact that, no, it don't works like you said, simply because you are thinking in mono-thread. What is done during the 1/60 of the time allowed to the game is :

The physics ;
The rendering ;
A portion of the audio treatment ;
A portion of the input treatment ;
A portion of the game script processing.

EverCursed said:
One thing I will recommend is to avoid using standard python data structures. For example, lists. It may seem like they are similar to arrays. [...] What's important though is that they are dynamic arrays of pointers.

They are dynamic arrays of pointers to objects, which imply that they are, more or less, three time slower than you think. But they still aren't to avoid.

EverCursed said:
The problem is that we don't know where it points to and if those memory locations are regular.

We know for sure that those memory locations aren't at all regular. Firstly because it's not C, the arrays aren't typed and so one cell can be a number, will the second while be a string. Secondly because it's Python, all data are none mutable, so they are recreated every time the value will change. This imply a new memory allocation and the guaranty that each cell of a list will point anywhere in the memory stack. It also guaranty that the language is slowed by its garbage collection.
Native Python is one of the slowest language, but still fast enough to handle a 3D modeling/rendering engine like Blender. Obviously, like said above, the critical part are optimized, and so not in Python, but the gain in speed is impacted by the fact that it's an interpreted language, which imply a two steps process before effectively reaching the optimized code.
It doesn't mean that Python is fast enough to handle a 3D game at 60fps, just that the latency implied by the language design isn't this impacting with nowadays computers ; in fact it isn't also on 10 years old computers.

EverCursed said:
As far as I'm aware, NumPy arrays seem to deal with these problems.

You know that NumPy

You must be registered to see the links

, right ?

EverCursed said:
I can only say this. Have you ever thought about why your phone battery can't last more than a day?

The answer is simple : Because your phone isn't designed for playing.

The fact is that nowadays computers are strong enough for us to not care anymore about optimization, except some critical points. Which lead to a new (implicit) paradigm started by OOL and push to its limit with Python : we don't anymore accelerate the way the code will play, we accelerate the way it's wrote ; the goal not being anymore to have the fastest code, because it will always be fast enough, but to have the easiest and clearer one.

EverCursed · May 15, 2019

Thanks for your comment, and I ask you to always be that guy! There's no point in monologues.

First point:
The game I played was a Ren'Py game. My phone is quite decent by current standards. And other Ren'Py games work just fine.

Second point:
I have not talked about multithreading yet, and it's actually not as simple as "you have 4 threads, so your program runs this many times faster". The caching architecture actually causes some really bad and insidious performance issues with multithreading in some cases. Though in general it does give a performance boost. More on this next time!

Third point:
Haha I can't really say anything there. I do make it seem like it's a bit more of a problem than it actually is. But I do have a reason for this. If you want to use them, go ahead. It's entirely up to you.

It's up to you, until you start noticing that your program is fairly slow, but nothing really stands out in the profiler. For simple programs this likely won't happen, but once you get into working on serious applications, this will likely happen more often than you think.

And in general profilers can't tell you that your problem is in caches. When it shows CPU utilization, that includes in it the time the CPU is spending waiting for memory, aka CPU idling. That's why this problem is invisible. You can't really see it, but its always there. I'll see if I can get some data on this!

Fourth point:

You must be registered to see the links

And let me tell you, that having looked at the source code, they are caching up the ***.

Fifth point:
I have absolutely no working knowledge of NumPy. It just looked to me like something that would work. I don't know if it's borked some other way...

Sixth point:
Well even without playing, there are far too many people in the world that have to charge their phones multiple times a day. And following the recent trend of phones getting thinner (and therefore the batteries getting smaller), I don't really expect this to change any time soon. And seeing many discussions on the internet about iPhone battery life being too small, I feel quite comfortable making that statement.

Overall I'll say this. I think many people make two underestimations. One: they underestimate how fast computers actually can be. And two: they underestimate how much of that potential is wasted. You can see up to 5:30 in

You must be registered to see the links

for just how large the difference can be between using OOD and DOD.

The fact that most companies care more about how quickly it takes to develop something shouldn't really surprise anyone. They want to save money. That's all it is. They make a choice to sacrifice the quality of their product to save some money and be first on the market. For them its a correct decision. But I really don't give a f*** about how much money they want to make. I at the very minimum care about the time I waste trying to open a simple application. Have you tried opening Visual Studio lately? You have to wait 30 seconds for it to launch, and then if you dare to use the UI, you have to wait another 10 seconds for it to load. FFS, it's just a text editor with a compiler. But they decided to save time and throw everything they can in there, Windows COM components, an SQL database (no joke). And then when you try to update it, it pretty much never works. Thanks, I'll stay away.

Alright, phew, rant over.

Winterfire · May 15, 2019

The explanation for that might be much simpler than the code being inefficient, at least when it comes to Unity, I have seen people that load up the game with no care or even ignore error printing or spamming which slows the game A LOT.
When it comes to Ren'Py, I have no idea if that is also the case but I wouldn't be surprised if the problem lies simply on the video player if that's the only place where your fps were 2.

EverCursed · May 15, 2019

Oh, I have absolutely no idea what the actual problem is. I don't want to just tell one developer what he did wrong... It's really just as as simple as "I want people to know how a computer actually works".

Edit: Okay I just looked at the apk for that game, and it seems to me that they play video by just stitching together a bunch of PNGs? I don't see any video decoding libraries or anything, hmm...

Winterfire · May 15, 2019

Yeah that's one of ways (not sure if the most common) to play animations on Ren'Py or so I have read.

anne O'nymous · May 15, 2019

EverCursed said:
First point:
The game I played was a Ren'Py game. My phone is quite decent by current standards. And other Ren'Py games work just fine.

Which will lead to a question : Are you even sure that it was a video ?
All what is animated in Ren'py isn't necessarily a video. Not only it could explain why it have a so small fps rate, but also it's more accurate with Ren'py capabilities. It can have problem with heavy videos, but will not fall this slow in fps unless the memory is already over saturated.
I mean, Ren'py use libavcodec for its multimedia part, not necessarily the best, but far to be the worse ; but, more important, it's a library with enough past to now be as efficient as needed.

EverCursed said:
Second point:
I have not talked about multithreading yet, and it's actually not as simple as "you have 4 threads, so your program runs this many times faster". The caching architecture actually causes some really bad and insidious performance issues with multithreading in some cases. Though in general it does give a performance boost. More on this next time!

The point wasn't in the speed, but in the division of the task. You can take three "OS tasks cycle" to perform a background task, it will have almost no influence on the game speed. But you need to perform all the physic and rendering in one cycle if you want an efficient frame rate. Splitting the process in at least two threads permit to achieve this, giving the priority to the thread in charge of the display, and letting the thread in charge of the background taskes deal with whatever time it still rest to the process. This while staying in mono thread will force you to have a strongest division and waste time waiting for the background task to effectively have something to do.
It also permit to have maxed background task, and so keep a constant maximal speed whatever the effective speed of the processor ; which is needed to keep the game playable whatever how old it is.

EverCursed said:
It's up to you, until you start noticing that your program is fairly slow, but nothing really stands out in the profiler. For simple programs this likely won't happen, but once you get into working on serious applications, this will likely happen more often than you think.

I'm working on serious applications since a little more than 25 years now. And I didn't cared this much about optimization during the 20 last one. Yet, in my youth I was part of the CPC demo scene. So, mostly I learned to code by writing code that was relying on the monitor refresh rate and where everything must be done in exactly one refresh cycle. When writing this kind of code, you don't think "algorithm", you think "processor cycle" ; there's no better school for learning to optimize your code.
But the fact is that nowadays computer don't need this much optimization ; again except for the 3D part, effectively critical process and neuronal process. It's only needed for mobile devices, but they aren't (yet) designed for playing. If I loose a race because I used a pickup against sport cars, it's not the fault of the car, it's my only fault. I should have choose a car designed to race.

The time when optimizing your code effectively gave visible result is far away. Nowadays there isn't (globally speaking) more than 1/100 second of difference between optimized code and not optimized one. Like I said above, it's still needed in some case, but they aren't related to gaming.

EverCursed said:
Fourth point:

You must be registered to see the links

Which say exactly the same thing than me... The 3D part is done in C, the handling in an OOL and Python is strong and fast enough to deal with the common tasks. There's script languages way more faster than Python, one of them would have been used if effectively it was needed.

EverCursed said:
Overall I'll say this. I think many people make two underestimations. One: they underestimate how fast computers actually can be. And two: they underestimate how much of that potential is wasted. You can see up to 5:30 in
You must be registered to see the links
for just how large the difference can be between using OOD and DOD.

Which doesn't mean that OOD isn't efficient, just that DOD is more efficient for critical process. Like every paradigm, it must be seen according to the context and not as a generality. By example there's times where the Python "Easier to Ask For Pardon" is better than the traditional "Look Before You Leap", and there's time where it's way too slow. Everything depend of the context.

EverCursed said:
Have you tried opening Visual Studio lately? You have to wait 30 seconds for it to launch, and then if you dare to use the UI, you have to wait another 10 seconds for it to load.

Seriously ? You're complaining about 40 seconds ? The problem isn't the lack of optimization, it's the impatience of some people. I have a question to return to yours. Have you tried to open an IDE ten years ago ? It needed more than 2 minutes to open... and it had less to do that what Visual Studio actually do. It doesn't matter if it take 40 seconds, because it just really doesn't matter.
I remember in the mid 90's, where we had as intern who was using it's own Mac II at work. Every morning the first thing he did when arriving at work was to press the "start" button. Then he was going for his coffee, greeting everyone, and with chance his computer was ready to use when he was back to his desk... And he didn't complained, it's part of the life and it's all. The computer was here to ease his works, and it's what it effectively did. What represent the 5 minutes, if my memory don't betray me, needed by the said computer to start ? Nothing in regard of the hours he was winning each day by using it.
But nowadays, computers aren't anymore seen as a help to gain time, and so people start complaining for (sorry for the word but there isn't others) stupid reasons, like 40 seconds. When I started to works, we had to change program just to see it the code was compiling. Then we had to browse into the raw result of the compiler to find the effective errors. The make files were to wrote from A to Z, and so on. So personally I'll not care if I need 40 seconds to win half an hour every day ; but well I don't use a compiled language this often now.

Could it be less than 40 seconds ? Yeah, surely. But even with a strong optimization (which would significantly increase the devel time) the gain will not be higher than 10 seconds. And if 40 seconds lost mean nothing, what to say about 10 seconds won ?

EverCursed said:
Edit: Okay I just looked at the apk for that game, and it seems to me that they play video by just stitching together a bunch of PNGs? I don't see any video decoding libraries or anything, hmm...

Do you even ? I haven't looked at the APK distribution, but I'm more than certain that it effectively play videos like any other distribution of Ren'py. But I get that the answer to the question in start of this message is : Yes, you complained about something that is an animation and was never intended to be confused with a video.

recreation · May 16, 2019

The problem with Renpy and animations is that it only caches a maximum of 8 images. A solid animation has way more than 8 images. Another problem is that video animations can be troublesome on phones depending on the used codecs.

polywog · May 16, 2019

EverCursed · May 16, 2019

recreation said:
The problem with Renpy and animations is that it only caches a maximum of 8 images. A solid animation has way more than 8 images. Another problem is that video animations can be troublesome on phones depending on the used codecs.

That's kinda bad to be honest. No way to config that yourself?

I'll have a look at Ren'Py internals later I guess, but there are always workarounds. You can, for example, just store the images in something like a YUV420 format (just make up the container yourself, it's not that hard). That will half the amount of data you need to read from storage. And PNGs are also kind of expensive to process, at a small decrease in size. I'm saying this not knowing if Ren'Py can display YUV natively or not.

recreation · May 16, 2019

EverCursed said:
That's kinda bad to be honest. No way to config that yourself?

I'll have a look at Ren'Py internals later I guess, but there are always workarounds. You can, for example, just store the images in something like a YUV420 format (just make up the container yourself, it's not that hard). That will half the amount of data you need to read from storage. And PNGs are also kind of expensive to process, at a small decrease in size. I'm saying this not knowing if Ren'Py can display YUV natively or not.

I simply switched to jpg after the alpha phase of my game and got a way better performance, plus the game is 10 times smaller now. I still have a bit trouble with some animations, but I found that for most (android) phones for long animations vp8 webm video works best.

EverCursed · May 16, 2019

So I looked a bit through the source and noticed that they're using libav* libraries.

There's some weird history between the libav/ffmpeg team and they ended up splitting a while ago. After that, libav started to be updated less and less, and bugs weren't getting fixed. At this point ffmpeg is superior, and that AFAIK you have to build yourself.

Another thing I see is that they use the sws_scale() function, and from personal experience there are actually two versions of it. One of them is mindblowingly slow, and the other is a 3rd party version and GPL licensed I believe. It comes with ffmpeg but you have to manually enable it (and I don't see them doing that, but might be missing it).

Just two things I saw, might be something else.

grtrader · May 24, 2019

Removed because I am tired of staff behavior

EverCursed · May 25, 2019

grtrader said:
FPS is a bad way to look at game performance it is missleading to say the least.

Yes, I completely agree. It's rarely useful, other than saying "you have this much time to get this much shit done, so you better hurry the f*** up". As for what that means in your specific case, only you know.

grtrader said:
A while back I got into a discussion with some people over the performance of JAVA vs C & C++. JAVA is a good language in a lot of ways. The fan boy of JAVA made the point that the language had in some cases actually out performed C++. He missed a vital fact though in his argument. JAVA's compiler is programmed with C & C++ depending on the JVM you look at.

Actually, it's funny you mention this, because I believe at one point Java WAS faster than C++ at VERY niche and specific applications. But overall this is usually a pointless discussion. If someone asked me what was faster, {some language} or C, I will say that I have no idea, but I'll try to make C run faster if needed in my case.

Edit: Oh and what you said about C++. You are probably right, though I try to stay away from C++ "features" as much as I can. Sometimes trying to find out how specifically something works is just not worth the time. And also, a lot of the time, class/function names are misleading and work some other way. And I don't really want to waste time trying to verify this for every class and compiler.

That being said, you can actually outperform C with C++, so who knows, maybe I should revisit it.

grtrader · May 25, 2019

Removed because I am tired of staff behavior

Bip · May 25, 2019

EverCursed said:
...
First point:
The game I played was a Ren'Py game. My phone is quite decent by current standards. And other Ren'Py games work just fine.
...

I have a not decent laptop, a mid-2012 Apple. I only play through a virtualization of Windows 7 x64 in 2560 x 1440 and I often have a lot of stuff running in the background. In other words, I am sensitive to these problems!
For Ren'Py, I systematically notice a flagrant slowdown as soon as the game is object-oriented. It never crashes, but the processor (well, what I give to the virtualized W7 (2 cores/4 threads @2.6GHz) runs hard. Some videos have slowed down a little, but nothing comparable. With a non-object oriented game, I never had this problem.

The only thing worse than a object-oriented Ren'Py is RPGM MV....

grtrader · May 25, 2019

Removed because I am tired of staff behavior

EverCursed · May 26, 2019

Bip said:
I have a not decent laptop, a mid-2012 Apple. I only play through a virtualization of Windows 7 x64 in 2560 x 1440 and I often have a lot of stuff running in the background. In other words, I am sensitive to these problems!
For Ren'Py, I systematically notice a flagrant slowdown as soon as the game is object-oriented. It never crashes, but the processor (well, what I give to the virtualized W7 (2 cores/4 threads @2.6GHz) runs hard. Some videos have slowed down a little, but nothing comparable. With a non-object oriented game, I never had this problem.

The only thing worse than a object-oriented Ren'Py is RPGM MV....

I think the problem is likely with something else. There are a bunch of layers where performance may be dropping in your case. One thing I would check is that you're not running out of memory in your VM. As grtrader said:

grtrader said:
You get into real problems when your program starts performing page hits from virtual memory.

Bip · May 27, 2019

EverCursed said:
I think the problem is likely with something else. There are a bunch of layers where performance may be dropping in your case. One thing I would check is that you're not running out of memory in your VM. As grtrader said:

I didn't do any tests to see what the problem was. It is just a statement that Ren'Py written in object-oriented was systematically much slower than the others in my VM (8GB of memory for it). So there is a reason for that anyway.
The only thing I'm almost sure of is that the programmer is not (the only) responsible. Afterwards, whether it is an increased need for memory, processor or coffee... But indeed, an increased need for memory seems likely

grtrader · May 27, 2019

Removed because I am tired of staff behavior

An introduction to why your game is slow - Part 1

Newbie

I'm not grumpy, I'm just coded that way.

Newbie

Forum Fanatic

Newbie

Forum Fanatic

I'm not grumpy, I'm just coded that way.

pure evil!

Forum Fanatic

Newbie

pure evil!

Newbie

Member

Newbie

Member

Active Member

Member

Newbie

Active Member

Member