[email protected] for w7forums.com


yodap

No longer shovelling
Joined
Mar 30, 2009
Messages
1,430
Reaction score
340
Thanks Clifford.

My situation yesterday happened when I was out and my folding machine went down for some yet unknown reason. This was the first time it happened since the Quad-core was installed and the faulty graphics card removed and I started using the on-board video. It tried to start back up on the Safe Mode screen and when no choice was given within the timer countdown it went to black screen with cursor blinking. This is how I found it.

It was almost halfway through a WU (500k type) when I left and the client restarted and picked up where it left off and finished in timely manner at least 2 days ahead of deadline.

At the top of the hour after results were sent, I was credit only 400 points and that WU usually credits me with 2500 to 2700 points. I have stopped my clients in the past for 2-3 hours and longer and never taken a hit like that. Was some of the data lost and therefor no bonus awarded?

Back to the computer. I don't know why it went down and it left no dump files. The core temps are constantly in the mid 50'sC since I tweaked the air flow a little. Not OC'd at all. It's been running and folding happily since.

I hope I've explained this well enough.
 
Ad

Advertisements

Joined
Mar 8, 2009
Messages
5,063
Reaction score
1,185
Hmmm, 3gb project, up to 97%....cannot find (or words to that effect).
4gb project 78%.....ditto. 4 projects like this so far, the little first one was no problem even through a storm of windows updates/reboots.
Getting nowhere fast here!! I assume I'm allowed to reboot as necessary?
I reboot all the time and the client starts back from the last check point.

However, I have heard others say its better to close the client down properly first. The Commandline client can be stopped by pressing "Ctrl+C". This was an old DOS command that stopped code from running. I remember using it many times. :)
 

Elmer BeFuddled

Resident eejit
Joined
Jun 12, 2010
Messages
1,050
Reaction score
251
Cheers CC, I'll try that one. I have been using the "Red X" to shut the Window. I did once see Clo..... but I couldn't read fast enough before, well, the window closed!
 

Fire cat

Established Member
Joined
Mar 7, 2010
Messages
1,157
Reaction score
191
This thread is wayyyy too long for me :eek:
And I don't dare fold because of BSODs I got. Maybe later.

Anyway, I don't have my comp anymore, so I can't do much at all.
 
Joined
Mar 8, 2009
Messages
5,063
Reaction score
1,185
At the top of the hour after results were sent, I was credit only 400 points and that WU usually credits me with 2500 to 2700 points. I have stopped my clients in the past for 2-3 hours and longer and never taken a hit like that. Was some of the data lost and therefor no bonus awarded?
I don't have anything to say on this one. Hopefully it was a one time deal and continue without problems. However keep us updated, if it continues I will help you search out an answer.

Back to the computer. I don't know why it went down and it left no dump files. The core temps are constantly in the mid 50'sC since I tweaked the air flow a little. Not OC'd at all. It's been running and folding happily since.
I'm new to over-clocking myself and should re-frame from giving advice in this area. The first few days with my GTS 450, I dropped my OC. I am now OC'ing my CPU from 2.66 to 3.0. I have also lowered my voltage from Auto to 1.05V. My temps are running lower than they were before the OC.
 
Joined
Sep 25, 2010
Messages
81
Reaction score
27
It was almost halfway through a WU (500k type) when I left and the client restarted and picked up where it left off and finished in timely manner at least 2 days ahead of deadline.

At the top of the hour after results were sent, I was credit only 400 points and that WU usually credits me with 2500 to 2700 points. I have stopped my clients in the past for 2-3 hours and longer and never taken a hit like that. Was some of the data lost and therefor no bonus awarded?
Yodap, you should open your FAHlog.txt file and ensure the project completed all 100 frames before it was sent. Sometimes errors happen that the core recognizes. It will shut down that work unit and return what was done. This is called an EUE (Early Unit End). Stanford may or may not give partial credit for these. I had one on my laptop this past Monday that only made it through 2 frames before it issued an Unstable Machine error. It's hard for me to know if I received any credit for that one at all. To my knowledge, that is the first unit I've had to EUE. My laptop is not OC'd, nor was I running anything else at the time.

So these things happen, but I understand some frustration over it. If it happens often, then we need to look at underlying causes. Folding pushes your machine like most no other program can. While overlocking my main desktop, I ran Intel's BurnTest for a solid hour at maximum settings, followed by more than an hour of MemTest86+ and nothing indicated unstability. So I started my folding again and half an hour later I blue screened.

It is when things like that happen that I try to remind myself that the points are fun, but the science created for Stanford's project is what it is really about. I find myself more concerned that they didn't get the results. I'm glad you continue to fold. I think this is really important work we're all doing here. :D

One other thought: I think you know the difference, but just to cover the bases, when you say you were 2 days ahead of the deadline did you mean the preferred deadline or the final deadline. Bonuses are only awarded when units are completed before the preferred deadline. To be exact, Stanford uses the time when the unit has been uploaded to the server as the completion time, not when your computer has finished the unit.
 
Ad

Advertisements

Joined
Nov 30, 2009
Messages
1,752
Reaction score
396
Memtes86+ needs at least 10 passes to be comfortable that RAM is not an issue.
Even then, it still can be.

Post the crash dumps and I'll sort it for you shortly.

C:\Windows\Minidump

Copy the files in there to any other folder. Zip them then attach the zip.
 
Joined
Sep 25, 2010
Messages
81
Reaction score
27
I've had 2 abnormalities with folding. From time to time, my FaHMon will have the word *hang* instead of the ETA to completion. I don't really know if the Folding Program itself or the Monitor is having issues so I end up closing and opening both and all proceeds. The other oddness was on the WU I am working now, I was moving the data from one location to another (while the client was stopped) and it seemed to retart ok but then threw up a message that it couldn't read/access some information and totally restrarted the WU.
Hi Draceena,

If you have the problem with FAHMon showing *hang* again, try restarting only FAHMon and see if that changes anything. If it does, the problem likely is with FAHMon. If it does not, then perhaps the folding client truely is hung. I have seen this condition in HFM on my machines before. My HFM is on my main destop and shows all four of my clients at once. Every now and then, my wife will be doing something on my old desktop that uses more clock cycles, therefore slowing down the folding. If HFM doesn't see an update to the FAHlog.txt in the time it thinks it should, it assumes the program is hung.

As far as moving your data from one location to another, this is unadvised by Stanford. That's not to say it can't be done, but I think they officially warn against it due to the problems that can be caused. If they didn't discourage it, more people would try it, possibly with the same result you had. That could add up to a lot of lost work for Stanford.

I have moved a work unit from my laptop to my main desktop before, only because I had to shut down all of my machines for a few days. I knew the laptop wouldn't finish the work unit before the shut down, but the desktop would. So I had to decide between scrapping the work unit altogether (75+% after 2 days of folding), or take the risk and try to transfer it to a more capable machine. I took the risk, the WU finished and I was able to shut everything down knowing I had no work units hanging in the balance.

Happy folding!
 
Joined
Mar 8, 2009
Messages
5,063
Reaction score
1,185
^^^ +1

I had forgotten about the time limit the monitor has before timing out and displaying an error. I've had this in the past and needed to restart the monitor to correct the issue because the monitor stops monitoring after the error.

I've had similar experience as Full_Taoer. Anytime you take a chance at changing a client in the middle of the WU you are taking a chance at losing the WU. Thats not to say that you will lose the WU though. I have moved folders around as well and knew there was a chance at losing the work.
 

yodap

No longer shovelling
Joined
Mar 30, 2009
Messages
1,430
Reaction score
340
Yodap, you should open your FAHlog.txt file and ensure the project completed all 100 frames before it was sent. Sometimes errors happen that the core recognizes. It will shut down that work unit and return what was done. This is called an EUE (Early Unit End). Stanford may or may not give partial credit for these. I had one on my laptop this past Monday that only made it through 2 frames before it issued an Unstable Machine error. It's hard for me to know if I received any credit for that one at all. To my knowledge, that is the first unit I've had to EUE. My laptop is not OC'd, nor was I running anything else at the time.
F_T

You are absolutely correct. This was the issue. If such an occurrence happens again, I will try to stitch the log together before the results are sent. Thanks.

Your other points are valid and taken to heart. If I think back just three months, 400 points in a day would be a fantastic day for me.

I'm glad to report that all is running well since yesterday.

Thanks again, Yo
 
Ad

Advertisements

Joined
Sep 25, 2010
Messages
81
Reaction score
27
F_T

You are absolutely correct. This was the issue. If such an occurrence happens again, I will try to stitch the log together before the results are sent. Thanks.
I'm not sure what you're talking about when you say "stitch the log together." If you found part of the log for that work unit in FAHlog.txt and part in FAHlog-Prev.txt, it would do nothing to stitch those together in the event of an EUE. I may be misinterpreting what you meant. As an example of what I mean by EUE, here is an excerpt of my logfile for the EUE I had earlier this week:

Code:
[10:39:32] *------------------------------*
[10:39:32] [email protected] Gromacs SMP Core
[10:39:32] Version 2.22 (Mar 12, 2010)
[10:39:32] 
[10:39:32] Preparing to commence simulation
[10:39:32] - Looking at optimizations...
[10:39:32] - Created dyn
[10:39:32] - Files status OK
[10:39:33] - Expanded 2311130 -> 2455585 (decompressed 106.2 percent)
[10:39:33] Called DecompressByteArray: compressed_data_size=2311130 data_size=2455585, decompressed_data_size=2455585 diff=0
[10:39:33] - Digital signature verified
[10:39:33] 
[10:39:33] Project: 2633 (Run 10, Clone 26, Gen 25)
[10:39:33] 
[10:39:33] Assembly optimizations on if available.
[10:39:33] Entering M.D.
[10:39:39] Completed 0 out of 625000 steps  (0%)
[10:48:16] Completed 6250 out of 625000 steps  (1%)
[10:56:40] Completed 12500 out of 625000 steps  (2%)
[11:00:16] mdrun returned 255
[11:00:16] Going to send back what have done -- stepsTotalG=625000
[11:00:16] Work fraction=0.0243 steps=625000.
[11:00:20] logfile size=14634 infoLength=14634 edr=0 trr=25
[11:00:20] logfile size: 14634 info=14634 bed=0 hdr=25
[11:00:20] - Writing 15172 bytes of core data to disk...
[11:00:20]   ... Done.
[11:00:21] 
[11:00:21] [email protected] Core Shutdown: UNSTABLE_MACHINE
[11:00:24] CoreStatus = 7A (122)
[11:00:24] Sending work to server
[11:00:24] Project: 2633 (Run 10, Clone 26, Gen 25)
 
[11:00:24] + Attempting to send results [November 1 11:00:24 UTC]
[11:00:25] + Results successfully sent
[11:00:25] Thank you for your contribution to [email protected]
[11:00:29] - Preparing to get new work unit...
[11:00:29] Cleaning up work directory
[11:00:30] + Attempting to get work packet
[11:00:30] Passkey found
[11:00:30] - Connecting to assignment server
[11:00:31] - Successful: assigned to (171.64.65.54).
[11:00:31] + News From [email protected]: Welcome to [email protected]
[11:00:31] Loaded queue successfully.
[11:00:35] + Closed connections
[11:00:40] 
[11:00:40] + Processing work unit
[11:00:40] Core required: FahCore_a3.exe
[11:00:40] Core found.
[11:00:40] Working on queue slot 01 [November 1 11:00:40 UTC]
[11:00:40] + Working ...
[11:00:40] 
[11:00:40] *------------------------------*
[11:00:40] [email protected] Gromacs SMP Core
As you can see, it only got to 2% before it wrapped everything up and sent it back, then picked up the next WU and went on like nothing had happened.

Like C_C said, let's hope it was a one-time occurance. :)

And speaking of C_C, congrats on breaking 1 Million points!!! There's a nice bump in your points this past week that must be from the new GPU. That second million ought to go by much faster than the first.
 

yodap

No longer shovelling
Joined
Mar 30, 2009
Messages
1,430
Reaction score
340
As you say, "Lets hope."

My thinking was (with wrong terminology), if I could show Stanford that the WU was in fact completed, with a brief shut down in the middle, that perhaps......:)
I am aware that log manipulation should be impossible.

I'm also fully aware the problem is on my end in this case.
 

Attachments

draceena

That Crazy Amazon Chick!
Joined
Jan 17, 2009
Messages
773
Reaction score
182
Thanks for the clarification. The folder move was a one-time thing and I definately won't do that again. As for the *hang*, yes I will admit I was doing some processor intensive work before I saw the message and in the future, I'll just restart the monitor and go from there, thank you!
 

Nibiru2012

Quick Scotty, beam me up!
Joined
Oct 27, 2009
Messages
4,955
Reaction score
1,302
Okay folks... now that I have a Radeon HD 5770 card now I'm gonna give this folding thing another shot. This card runs 10 degrees C cooler than my old HD 3850 plus it's got some butt-kicking speed, over a billion transistors, 1 GB of DDR5 RAM, etc.
 
Joined
Mar 8, 2009
Messages
5,063
Reaction score
1,185
Nibs you might want to check out the GPU Tracker V2 program. It's working out great for both my CPU and GPU.

Once you have setup the configuration, you can start the clients and minimize the app to the notification area hiding all Taskbar icons.
 
Ad

Advertisements

yodap

No longer shovelling
Joined
Mar 30, 2009
Messages
1,430
Reaction score
340
Okay folks... now that I have a Radeon HD 5770 card now I'm gonna give this folding thing another shot. This card runs 10 degrees C cooler than my old HD 3850 plus it's got some butt-kicking speed, over a billion transistors, 1 GB of DDR5 RAM, etc.
Great Nibs.
It looks like you can do some damage with that. Start slow and see how it goes.
 

Elmer BeFuddled

Resident eejit
Joined
Jun 12, 2010
Messages
1,050
Reaction score
251
Hmmm. I D'led the HFM.Net v0.6.1 Beta..... And its empty. Can't for the life of me figure out how to get it to show anything :( Yet the FAHMon just... did it.
 

Elmer BeFuddled

Resident eejit
Joined
Jun 12, 2010
Messages
1,050
Reaction score
251
OK, I've got this far.. Does this look about right?
Capture.PNG

I'm looking at the Team:7 and Project:77 down the bottom. With the assumption that Team:7 is Windows 7 Forums.
 
Ad

Advertisements

yodap

No longer shovelling
Joined
Mar 30, 2009
Messages
1,430
Reaction score
340
Hi Elmer,

It looks good except that your status should be green. Sometimes it's yellow when you first start the program.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top