1 registered members (3s05bmmc),
745
guests, and 3
spiders. |
Key:
Admin,
Global Mod,
Mod
|
|
|
Random Crashes under Windows 10
#471235
02/24/18 17:40
02/24/18 17:40
|
Joined: Apr 2002
Posts: 680 Germany
Turrican
OP
User
|
OP
User
Joined: Apr 2002
Posts: 680
Germany
|
Hello everyone, I am currently working on finalization of my project. I am supported by Kartoffel, who wrote the rendering pipeline and some of the tools for the game. There's a serious crash bug in the game that we just can't get by. The project has become relatively complex and the error occurs very randomly, and we can't find the cause even after literally several hundred hours of work. I hope that someone from the forum or the developers can help us out, or at least give us an advice. I'd also like to make sure that this is not a global engine bug - so in case you experience similar issues, please let us know. Things are extremely urgent right now for organizational reasons and because I'm planning to launch soon, so I'd be more than grateful for any help! First off, some basic information: I am developing the game on a Windows 7 system, but have been testing regularly on Win 10 systems. For the last few months however, I did not run these tests as regularly as before, since I've completely focused on programming user interface, menu, DLL implementation (Steam and misc file operations). It's possible that the crash bug has come in somewhere during that time, but I can't say for sure. It may have also been there for a much longer period, and I just didn't notice before due to the lower complexity of the game at that time. Bug Description:- The project produces random crashes under Windows 10 (so far tested on five systems).
- The game runs fine under Windows 7.
- Both the published and source versions are affected.
- The crash can happen at any time: Sometimes the whole game can be played through, sometimes it already crashes after a few seconds.
- The cause of the crashes is very random and difficult to pinpoint. These are our best practices in terms of reproduction - the problem appears:
- while attempting to load levels (no matter if triggered by gameplay functions or by user input/UI) - when pausing the game (= setting freeze_mode to 2 + calling some UI functions) - when moving the mousecursor over or away from a specific UI element (without clicking) - less often during or after a message is displayed, or at any random point in the game.
The crash usually appears in one of these forms: - 1) Often:
The screen freezes for a second A standard Windows error message appears:"acknex.exe stopped working" - 2) Less frequent:
An Acknex error message appears: "Script crash in load_map" After closing the message it continues as described in [1] (in most of the cases) Please note: This behavior is not limited to a certain level. It also happens with levels that work fine in any other case. - 3) Rarely:
During the loading process an Acknex error message appears: "Bad file format: stage####. wmb" After closing the message it continues as described in [1] Please note: This also happens with levels that otherwise work without any problems.
Here's our current assumption, as we already expressed in a different thread: (...) it looks like these crashes only happen on windows 10 so far. If it's memory-related, I still think it's something produced by the code, though. The different operating systems might handle memory allocation/management differently which makes the crashes we're having more likely on win 10. Measures taken so far:As I said, the bug is difficult to narrow down. We have already taken numerous measures for error analysis - unfortunately without noticeable improvement. If there was an improvement, it was in the form that the occurrence of the crash was delayed, or happened under different circumstances. Our latest approaches include: A) Disabling plug-ins We do use some DLL plugins, mostly self-written ones. Of course, we have already tried to deactivate these plugins completely and also made sure to remove them from the acknex_plugins folder. It didn't do any good. B) Narrowing it down By turning off large blocks of code and/or complete features (menu, UI, AI, sound, player, controls + more + combinations), we have tried getting to the root of all problems. Wasn't too successful either. If there was a perceptible improvement, we re-activated line by line to see were it came from, but usually to no avail. During one of these tests I noticed the following erratic behaviour: - I've put a 'return' right at the beginning of the following function. I did not experience the same crashing behaviour as before.
function player_spawn()
{
return;
(...)
}
- While re-adding lines of code, I noticed that it didn't matter what I put before the 'return'. Anything, even a comment, would re-enable the crashes.
function player_spawn ()
{
// test
return;
(...)
}
It is possible that this was just a coincidence. But I thought it was strange enough to mention it here. C) Disabling certain model files Our main menu uses eight background models spawned via ent_create and animated by code. While narrowing things down, I also tried to disable these models, just to see what would happen. The crash actually occurred less frequently afterwards. It did happen anyway, only much later and less frequent. I wonder where this came from. Are these model files corrupted? Could there be a mismanagement of internal memory areas when I load these models? When I don't load them, could there be kind of a "shift" in memory allocation, so that the crash occurs less often? I'm not sure what this behaviour tells me. D) Replacement all model files Encouraged by the partial success described before, and due to the fact that some users on this forum reported bugs with "corrupt" model files, I replaced all 1,615 models in the project folders with a simple dummy object. It didn't fix the crash, but I don't think it occurred that frequently anymore. Of course, it was difficult to really "play" the game this way, so I did not carry out extensive tests. Is it possible that other file types could be corrupted? Should I try to replace sounds, media and textures with dummies? E) Manual termination of all "critical" functions A very, very time-consuming approach that I have been pursuing over the last few days: I make sure that all "critical" functions are terminated before loading. In detail: I track all potentially critical loops by having them count up a variable at the beginning of the function. The level loading routine waits until all these functions are finished and the sum of all variables reaches 0. It didn't improve anything (but at least I was able to fix some sporadic pointer errors this way). F) Data analysis Of course we have studied the acklog, built in countless diag-instructions, use our console and write out a tasklist in situations we think are critical. Problematic variables and objects are monitored via on-screen debug displays. Only thing we haven't tried yet is to implement a ton of sys_marker instructions, as Superku recommended. I'm not sure if this could help us any better pinpointing the cause of the problem, but it might be worth a try. Next steps:Here's what I could still try. Don't have much more on my list right now, I must say. I) Check/disable windows.h functions We use some features from windows.h - could there be a problem with deprecated instructions that could lead to Win10-only crashes? II) Check plugin source code Although I already ruled out the plugins as a cause, I'm going to go through them step by step and look for possible error sources. III) PROC_GLOBAL In the course of the menu programming during the last few months, a quite large number of PROC_GLOBAL-Instructions have found the way into my code. Since Superku mentioned that this function may not always work correctly, I will soon deactivate it for testing purposes. IV) Further narrowing it down We still have a few ideas about which parts of the code we could investigate further, even though I have the feeling this won't shed any more light on the case. We'll try this during the next days. Related threads:http://www.opserver.de/ubb7/ubbthreads.php?ubb=showflat&Number=470982#Post470982TL;DR: Game crashes only under Win10, tried almost everything, please help!
|
|
|
Re: Random Crashes under Windows 10
[Re: Turrican]
#471236
02/24/18 18:19
02/24/18 18:19
|
Joined: Sep 2003
Posts: 6,861 Kiel (Germany)
Superku
Senior Expert
|
Senior Expert
Joined: Sep 2003
Posts: 6,861
Kiel (Germany)
|
Oh boy, this sounds awful (and gives me unpleasant flashbacks)! Are you aware of the following thread (the issues, random source code changes leading to different results/ crashes, file format stuff)? http://www.opserver.de/ubb7/ubbthreads.php?ubb=showflat&Number=458100#Post458100Which engine version are you using, 8.47.1? If you don't... you should. Do you use view/ sky entities? If so, how are they created? Do you use default acknex sounds? Have you tried disabling them (-ns)? About plugins: As I may have said before at some point I once managed to really mess up my game, having all sorts of random memory issues, inexplicable behavior and random crashes, because I sent a NULL string to a Steam API function on accident which really did not like that. I assume you've been experiencing those issues way before you started with the Steam implementation, right?
"Falls das Resultat nicht einfach nur dermassen gut aussieht, sollten Sie nochmal von vorn anfangen..." - Manual Check out my new game: Pogostuck: Rage With Your Friends
|
|
|
Re: Random Crashes under Windows 10
[Re: Superku]
#471238
02/24/18 19:57
02/24/18 19:57
|
Joined: Apr 2002
Posts: 680 Germany
Turrican
OP
User
|
OP
User
Joined: Apr 2002
Posts: 680
Germany
|
Yep, it _is_ absolutely awful. I know that thread you posted and I've read it many times. I recently checked all my models to make sure there are none with "mixed" or only empty skins. It's possible that I missed one, so I'll be going through all of them again tomorrow, just to make sure. But as far as I remember, that bug had been fixed anyway, right...? Brings me to your next question: Yes, I'm using the latest version (8.47.1 (Pro, if that makes any difference)), and made sure the guys supporting me do so, as well. Next: Yes, there's a single sky entity in the game. It's defined like this:
ENTITY* skyscene =
{
type = "sky_scene05a.mdl";
layer = 3;
scale_x = 380;
scale_y = 380;
scale_z = 380;
y = 28000;
flags2 = SKY | SCENE;
flags = FLAG2;
albedo = 100;
}
Things I'm doing with it at runtime: - change of position vector
- change of scale_x vector
- visibility control (via flags2 - SHOW)
- color manipulation (by modifying red/green/blue)
- ent_morph'ing to different, level-specific models after level_load
Sound: No, I did not try that using the engine parameter. What I did so far was to test with sound_vol set to zero, and that didn't make a difference (if I remember that correctly). Will check that tomorrow. Plugins: In fact, I don't recall having these issues before starting the Steam implementation - however, I've changed so many things around that time, that it could easily be almost anything else. I'd rule this one out anyway, as the Steam plugin is currently fully disabled. I also removed the DLLs from my Win10 test environment, just to make sure. Good point with the NULL strings though - I will make sure to check all the strings I'm sending. While we're at it - in my leaderboards code I have this slightly suspicious part where I'm fiddling around with a local short array and two char-pointers. Maybe let's have a look at it:
function mm_lb_fill() // used to fill lb strings
{
(...)
// defining local stuff...
short short_array_x1024[1024];
char* _nameUTF8="";
char* _nameFINAL[10];
(...)
// The function ends here.
// No, I'm not removing the char pointers - should I?
}
Remember: As I said, the Steam implementation is currently turned off, so this part of the code should never be executed anyway. But since I recently learned that local STRING* pointers should a) be created using str_create(), rather than simply defining them like 'STRING* test = "";' and b) should also be str_remove'd by the end of the function, I keep asking myself if there's an equivalent guideline for using chars. Could the way I define them here be problematic? Could the precompiler come across this part of the code and go totally nuts? I mean, it shouldn't, for my understanding, but I'm starting to look for potential mistakes anywhere.
|
|
|
Re: Random Crashes under Windows 10
[Re: Turrican]
#471239
02/24/18 21:22
02/24/18 21:22
|
Joined: Sep 2003
Posts: 6,861 Kiel (Germany)
Superku
Senior Expert
|
Senior Expert
Joined: Sep 2003
Posts: 6,861
Kiel (Germany)
|
Try removing the sky entity from your game, completely. Especially if it has a (big) texture. Those things are quite suspicious to me (I had random polygons/ planes visible in my levels at some point, many years ago, and some objects missing after level load - turned out for some reason the big skybox texture/ sky entity caused it - or seemed to have caused it). Many years ago I wrote a program in lite-C to solve a puzzle. The first algorithm was implemented recursively. It worked fine up to tile 100-150 maybe, then it got weird. It would tackle the wrong tiles after that, change stuff or regard stuff as wrong/ correct erroneously. I had to learn that stack memory is limited, and the allocation of short short_array_x1024[1024]; might "fail" or be not quite what you wanted after some recursion. This most likely does not apply in/to your case but sharing it might be useful for you after all. It's probably better to sys_malloc (and sys_free) large(r) arrays dynamically. I recently learned that local STRING* pointers should a) be created using str_create() STRING(*) pointers do not need to be manually allocated, STRING objects/ "instances" do though.
STRING* strPointer = NULL; // you could write for example void* as well instead of STRING*
strPointer = localisationGetString(DIALOG_001); // returns a "constant" string, for example a text.pstring element
[use strPointer for further non-manipulation/ drawing text]
If you want to manipulate the string you should str_create("") a new STRING object though:
STRING* strPointer = str_create("");
[manipulate and use string...]
ptr_remove(strPointer); // every str_create needs to have a ptr_remove
You can write stuff like [STRING* strPointerArray[10];] as well, of course. Depends on what you want to do with it though. If you write then you will have an array of 10 char pointers. _nameFINAL does not sound like that's what you were going for. char _nameFINAL[10]; pretty much is a pointer named _nameFINAL that points to 10 (char) bytes, which you can use for string/ char manipulation/ a 9 digit name. Example:
char _nameFINAL[10];
_nameFINAL[0] = 'A'; // or _nameFINAL[0] = 65;
_nameFINAL[1] = '8';
_nameFINAL[2] = '';
draw_text(_nameFINAL,400,400,COLOR_RED);
or something like str_cat(strPointer,_nameFINAL);
If you write something like char* myChr = "Wow!"; you should treat that as static, don't write into it. STRING* myStr = "Wow!"; // inside a function in this case is just a pointer (which could be void* as well) that points to a static char array (if I'm not mistaken), and should not be changed/ probably not what you want. What I usually do (although it may be frowned upon) I have a bunch of global objects STRING* tmpStr1 = ""; STRING* tmpStr2 = ""; STRING* tmpStr3 = ""; I use them in functions for temporary string manipulation so I don't have to str_create/ ptr_remove every time.
"Falls das Resultat nicht einfach nur dermassen gut aussieht, sollten Sie nochmal von vorn anfangen..." - Manual Check out my new game: Pogostuck: Rage With Your Friends
|
|
|
Re: Random Crashes under Windows 10
[Re: Superku]
#471261
02/25/18 15:48
02/25/18 15:48
|
Joined: Apr 2002
Posts: 680 Germany
Turrican
OP
User
|
OP
User
Joined: Apr 2002
Posts: 680
Germany
|
Oh boy, that could definitely be a problem - I just checked, and while I do have sort of an "update fonts" routine, it seems I'm only calling it before actually initializing all fonts via AddFontResource. Never thought that could a problem, as all fonts look correct in game.
Hang on, I'll give it a shot.
EDIT: Still no luck: Completely removed the AddFontResource calls. Fonts are defined right at start and never changed. Had the impression that it crashed later/after more attempts, though. But nontheless, it crashed several times during and after level change.
Last edited by Turrican; 02/25/18 16:27.
|
|
|
Re: Random Crashes under Windows 10
[Re: Turrican]
#471280
02/26/18 13:39
02/26/18 13:39
|
Joined: Apr 2002
Posts: 680 Germany
Turrican
OP
User
|
OP
User
Joined: Apr 2002
Posts: 680
Germany
|
Just a small update with the things I tried today: As I wrote earlier, there are a lot of "proc_mode = (...)" instructions in my source. I suspected that some of these might have lead to my Win 10 issues. So I went over and removed *any* of these instructions, with three exceptions: 1. A _startup function that handles pause state and visibility of some related UI elements in a while loop. 2. A _startup function that handles controls. In fact, it recognizes key presses and sets values of variables accordingly, for other functions to evaluate. 3. And finally, a function that handles mouse control. As these contain important control features that need to run during pause, I simply kept the proc_mode = PROC_NOFREEZE inside their while loops (that's correct, isn't it? we should put proc_mode calls directly into a loop, right?). Now, some engine-error messages I had before, are seemingly gone. Haven't tested this thouroughly, but it could be that it fixed some other bugs. Nice. But the Crash bug itself, with that "acknex.exe has stopped working" message - it's still present! And I can reproduce it by starting up the game, beginning a new game from the main menu, then the level loads, then I press <P>, which activates my pause function, and the game crashes. So, my next idea was obviously: Then it must be the freeze_mode = 2 call itself. Commented it out, still crashes. Okay, crazy. Then I started looking for other things that happen when I press <P>. Among other things, the mouse control gets activated. So I went there, copied the mouse control function, tried a few things, reverted them, and basically ended up with *two* instances of my mouse handling function, both exactly the same, except for their name - and I couldn't reproduce the bug anymore. Here, see for yourself:
function handle_mouse()
{
if(MOUSECONTROL_ACTIVE) { return; }
MOUSECONTROL_ACTIVE=1;
mouse_mode = 1;
mouse_map = tga_mousemap_gui;
mouse_pointer = 0;
var _button_pressed=1;
while(1)
{
proc_mode = PROC_NOFREEZE;
if(MOUSECONTROL_ENABLED)
{
if(mouse_mode > 0) // move it over the screen
{
if(mouse_moving)
{
vec_set(mouse_pos,mouse_cursor);
}
if(mouse_valid) // inside engine window?
{
// enable keyboard/gamepad controls
if(inp_force.x != 0 || inp_force.z != 0)
{
mouse_pos.x+=(inp_force.x*MOUSECONTROL_JOYSPEEDFACTOR)*time_step;
mouse_pos.y-=(inp_force.z*MOUSECONTROL_JOYSPEEDFACTOR)*time_step;
}
}
}
else
{
mouse_mode=1;
}
}
else
{
mouse_mode=0;
}
wait(1);
}
}
function handle_mouse_old()
{
if(MOUSECONTROL_ACTIVE) { return; }
MOUSECONTROL_ACTIVE=1;
mouse_mode = 1;
mouse_map = tga_mousemap_gui;
mouse_pointer = 0;
var _button_pressed=1;
while(1)
{
proc_mode = PROC_NOFREEZE;
// if(GAME_MODE==0 || freeze_mode==2)
if(MOUSECONTROL_ENABLED)
{
if(mouse_mode > 0) // move it over the screen
{
if(mouse_moving)
{
vec_set(mouse_pos,mouse_cursor);
}
if(mouse_valid) // inside engine window?
{
// enable keyboard/gamepad controls
if(inp_force.x != 0 || inp_force.z != 0)
{
mouse_pos.x+=(inp_force.x*MOUSECONTROL_JOYSPEEDFACTOR)*time_step;
mouse_pos.y-=(inp_force.z*MOUSECONTROL_JOYSPEEDFACTOR)*time_step;
}
/*
// commented out, just to see if there's a problem with this...
if(inp_button1)
{
if(!_button_pressed)
{
mouse_event(MOUSEEVENTF_LEFTDOWN | MOUSEEVENTF_ABSOLUTE, 0, 0, 0, 0);
_button_pressed=1;
}
}
if(!inp_button1)
{
if(_button_pressed)
{
mouse_event(MOUSEEVENTF_LEFTUP | MOUSEEVENTF_ABSOLUTE, 0, 0, 0, 0);
_button_pressed=0;
}
}
*/
}
}
else
{
mouse_mode=1;
}
}
else
{
mouse_mode=0;
}
wait(1);
}
}
Doesn't make ANY sense at all, I know, but that's the current state. It strikingly reminds me of the problem I described under B) in my initial post, and I simply don't get what is going on here. This is really driving me insane. With this bug I'm looking for the needle in the haystack. I can't even tell what to do with this finding - does this indicate something? I mean, it would really help if at least I could pinpoint it to something like "oh, there must be a pointer I'm using the wrong way" or maybe a broken resource file or something, but I only have this immensely strange behaviour and don't even know why it occurs. Any hints? JCL?
Last edited by Turrican; 02/26/18 13:52. Reason: Inserted a comment-block to second mouse function to really depict the current state of the function
|
|
|
Re: Random Crashes under Windows 10
[Re: jcl]
#471350
02/28/18 17:42
02/28/18 17:42
|
Joined: Jun 2009
Posts: 2,210 Bavaria, Germany
Kartoffel
Expert
|
Expert
Joined: Jun 2009
Posts: 2,210
Bavaria, Germany
|
So, after some more testing it looks like some of the problems we were having are related to the nexus value being too low. After increasing it, a lot of the crashes are seemingly gone. We're not 100% sure if low nexus really is the issue or if increasing it just bypasses the actual causes of the crashes but so far it looks promising.
POTATO-MAN saves the day! - Random
|
|
|
|