Hit Buffer production logbook. ------------------------------ Short electronic summary of what is in more detail in paper book. HB 1 & 2 built in '99, tested at Fnal/Pi/Ts see paper book. All logic problems found then have been corrected in Xilinxes, for prodcutions added only silk screen and ground floating inputs on HRVME. Production boards 1-2 tested in Jan/Feb 2000, all OK but some reversed LED and one unsoldered ground pin (bent before mounting) on one HCT123 chip (exact location lost, found by Simone, fixed by Stefano). Full production (14 boards) arrive May 10. First tests in Pisa: permorm 2 tests on all new HB's (5-18): 1. using hb_menu: ts,te,in,idt,enw,enr 2. run hb_test_random for about 2 minutes (2K iterations of 200 events) All HB's pass 1 except HB #16. Two HB's (#7,#10) fail 2. Some scattered LED problem (HB #5 AND #8). May 11: HB #16: after few successfull operations, returns always BERR_. Fixed by removing small solder ball (needed no heat) between VMEPLD pins 72-73. May 12: HB #7: input Hit bit 0 intermittently wrong. Cold solder on FIFO. resoldered by Marcello. HB #10: fails AM_MAP check. RAM U31 returns always 0. Cold solder on WR_ pin resoldered by Marcello HB #5: power LED was bad, replaced by Gianfranco. HB #8: output HOLD led light in place of DS. Traced to bad solder on pin 8 of U80 (HCT123 ground pin). Pins was bent before mounting (as in HB #4). Fixed by Stefano. May 15/24: Extensive tests in Trieste: On all boards 5-18 the following 3 tests have been performed. all for Nlayer=7. 1. hb_quick_test.c - R/W all registers with 0, 1, rotating 0, rotating 1 - R/W SS/AM with random numbers and complementary at least 4 times - check all LEDs - check all data bits for stuck/short on FIFO/ROAD/OUTPUT - check in/out Hold, Fifo flags - check SVT and CDF error led and backplane line this program was ran twice on each boards. 2. hb_hlm_test.csh - this is the program devloped by Simone and Stefano at Fnal in October 1999 that discovered the HRVME reset problem. It tests all possible combinations of SuperStrips addresses in the Tag Ram, checking all Hit List addresses generated in all possible multiplicity cases. this was ran once on each board 3. mhb_test_random.c - latest version of HB random test - checks data path and Spy Buffers with randomized data. SuperStrips are kept simple while AM_MAP is random (i.e. all possible roads are tested) - was ran for at least 10K iterations on each board, many boards for much longer (up to 100k). On average 100 events in each iteration, output data is checked at all times, Spy Buffers once each 20 iterations (5 for last boards). this was ran more extensively on some baords, will keep running it until boards are shipped to check for infant mortality All boards passed all tests. No problem was found in Trieste so far. Some intermittent errors on HB 13 were apparently due to problems in transmitting roads from Merger, could not get to reproduce them. Sometimes (about 5 times in the week) got a read of all 0's for HB Spy Buffer Pointers. Guess it has to do with temperature. Put code to catch it and dump details. All HB's are in one crate, one each slot. Test usually is on two nearby boards powered up. May 25/26: For 12 hours kept all even, then all odd Hit Buffers on running VME checks on them with no errors, just as additional heat test. May 29: Resume VME test on odd/even. Today is odd (7,9,11,13,15,17). Run MHB_TEST_RANDOM on 2 HB. Today is 5-6. Got once road transmission error from merger #11. No clue. Keep running. Fixed small memory leak in mhb_test_random. Stop tests on May 30. Number of iterations: Random test ~90K Vmeonly = 300 May 30: VMEonly : today is even: (6,10,12,14,16,18) no 8.Run till May 31. 400 iter OK. MHB_TEST_RANDOM, today is 7-8. Got error after 1700 iter. All Spy buffers are wrong, spy pointers are different in all boards==> Multiboard error! Makes no sense. Maybe a glitch induced by test merger being inserted/extracted. Keep running, 6000 OK. Halt to debug code. Resume test at 18:30. Got another very confusing error after 1K iter. Leave it for tomorrow. May 31: VMEonly: today is odd: (5,9,11,13,15,17). No 7. Leave 7-8 to understand yesterday's problem. Configuration was HB0=vmesvt1/7 HB1=vmesvt1/8 MRG0(#8)=vmesvt2/5 MRG1(#11)=vmesvt2/9 HB output goes to megers input C. Apparently: - HB 0,1 hit spys are OK - Hb0 road_spy pointer was reset before last bunch, otherwise OK (so dump is wrong, but last roads are OK) - HB0 and HB1 output spy's are OK - HB0 output in mrg0 is OK - HB1 output in mrg1 is BAD: 9 extra words at beginning: 402087 202087 4020a7 6020a7 6020a7 6020a7 6020a7 6020a7 6020a7 N.B.- HB0 road spy pointer register address is 34h, same as merger C Spy. - last good EndEvent word in mergers C was 6020a7. first 3 extra words are "subset" of this, then it repeats 6 times. ==> looks like when it was written DS_ to Merger, or DS_ in merger ??? "bounced" strobing the data 10 times with a few bits glitching ??? First thing could be a "vme glitch" Second is much harder... bad cable ? Maybe interference with VMEONLY test running on same crate, an output_write command somewhow "seeped throuhg" ? Give up explanation and leep running... Restart May31 16:50. Stop by mistake bY mauro at 17:30 (init to merger). June 1st: restart MHB test random at 10:30. Leave VMEonly running. Will stop tests and power off HB's this afternoon till next week. found more problems like yesterday. Found some ghost vmeonly process.. maybe that is the cause ? Stop vmeonly and ran ~20 hours with no error. June 12: resume tests. Run again MHB_TEST_RANDOM on HB #7-8 Mrg0 is #15, Merg1 #11 (suspectd of previous strange problems). No other HB powered up, no fans. Got some single bit errors in output from HB 8, gone after power cycling (roboclok failing lock ?). Also just once got once case of bad VME write of Merger 0 ouput spy (word n gor written at locations n and n-1 overriding n-1). Ran for 190K iter with no error on Mergers input C. ==> must have been all fault of vmeonly ghosts... HB #7/#8 are fine. Merger#11 is fine. June 13: MHB_TEST_RANDOM on HB #9 and 10. No fans. Mrg #15/11 input A. No other HB powered up. Got a couple of errors in HB0 (extra 1's in output) gone by retrying... (appear same error as yesterday...) Stop after 60K iter OK to look at HB Spy Reading while not-frozen (strange behavious discovered by Marco). June 14: MHB_TEST_RANDOM on HB #11/12. Mrg #15/11 input A. No other HB powered up. No fans. Stop after 211K iter OK. June 15: MHB_TEST_RANDOM on HB #13/14. Mrg0 is #15 Merg1 is #13 input C. No other HB powered up. No fans. 47K iteration OK, then switch to Mrg0 is #12 Mrg1 is #13 input A. then 17K iter then stopped by spurious SVT_INIT issued by merger test script. total ~70K iteration OK. June 16: MHB_TEST_RANDOM on HB # 15/16. Mrg0 is #12 Mrg1 is #13 input A. No other HB powered up. No fans. 2K iter OK before starting log. 25K iter OK before glitsh while inserting board in next slot. 47K iter OK before AM_Map read on hb#16 fails while bad merger in same crate is being tested. 678K iter OK until stopped monday morning at 9:00 by work on merger crate. Total 750K OK. June 19: MHB_TESTT_RANDOM on HB # 15/16. Mrg0 is #12 Mrg1 is #13 input A. No other HB powered up. No fans. 60K iter OK. Stop for vacations ! July 5: MHB_TEST_RANDOM on HB # 9/10. Mrg0 is #12 Mrg1 is #13 input A. Change crate to top one (vmesvt2). No fans. HB #9 fails badly !#$@$!!! set it aside MHB_TEST_RANDOM on #13/10. 33K iter OK. Then errors (15:24): extra words in Hit_Spy hb1 and received data botyh mergers. Output Spys all OK. Sounds like merger problem. Maybe due to activity in crate (Mauro programming one merger...) Restart at 16:20. Error writing AM map to HB0 at 16:30 after 1k iter. Mauro was programming another merger... Restart at 17:10, 34K good so far. Stop Junly 6 10:20 156K good. Total 190K good iter. July 6: MHB_TEST_RANDOM on HB # 14/17. Mrg0 is #12 Mrg1 is #13 input A. Start at 10:43. 83K OK at 19:45. End July 7 17:15 at 296Kiter. July 7: MHB_TEST_RANDOM on HB#2/18. Mrg0 is #12 Mrg1 is #13 input A. Start at 17:25. In the meanwhile keep debuggin HB#9 that failed. Global error (both HB on all spys) after 6K iter.. no clue. Restart At 19:30 10K OK. Let it running. 253K iter OK July 8. 142K iter OK JUly 9. End test July 10 11:45 with 300K iter OK. July 10: MHB_TEST_RANDOM on HB#3/4. Mrg0 is #12 Mrg1 is #13 input A. These are the HB's returned to Eclipse for fixing Leds resistors. Now keep debugging new problems in HB9. 2 problems found at first run of hb_hlm_test.csh: 1: bit 0 of output data gets stuck to 1 after is set to 1 the first time, both in output and OSPY. Put scope probe on input to OUT0 xilinx (pin 3 using clip) ==> slow discharge to 0, like resistive contact. Resolder pin 3 of OUT0 and driver (pin 7 of MLDATA) (Mauro). No change. nvestigate with Ohm-meter ==> BAD VIA underneath MLDATA: R: pin 7 - pad of MLDATA =0 R: pin 3 of OUT0 - via with GDATA0 label = 0 R: via with GDATA0 label - via underneath MLDATA = 0 R: via underneath MLDATA - pin 7 of MLDATA = 5 Mega !! Mauro tries to fill the via under the chip with solder from the back... now it works ! 2: after a while End Event gets completely corrupted and stays like that, sort of random bits each time, until data are sent again... then again first 20 events or so OK. Hit/Road spys are OK. No error. Out spy is the same as output. Suspect End Event bit to MLDATA... E.g.: expect: find: 200013 200013 600013 600013 200014 200014 600114 767d7d 200015 200015 600015 707874 200016 200016 600016 64705f and so on. apparently gets the same data in output each time, so not fully random. Also hits are being loast and corrupted. If send only data for multiplicity 1 (instead of 0) gets always only 4 hits in output (instead of 8) with bad content.... But Hit Spy is OK. So problem seems to be MLDATA ... ? Still no error, no hit overflow, no lostsync... Narrow to problem of 4 hits coming out instead of 8 for first event with multiplicity 1. Also the 4 hits are all 0. Discover wrong value for OEHIT_, explains all0, but not only 4. Track OEHIT to bad connection between READY line and READY pin on OUTEN (pin 42), measure 50K between via with READY label and pin 42 of OUTEN (XU3). Fix temporarely again with same method as 1: of infiltrating solder into the via. Also discover a third problem: 3: AMAPD14 in output from HRVME is always 0. While CMAPAD14 and BMAPAD14 work fine. fid no sign of shorts or similar. Eventually decide to disconnect pin. Damage pcb pad in the process (sorry). Looking at signal on pin once lifted from cb, it is still always zero. ==> output drvie of Xilinx is broken. Solder wire between vias for AMAPADD14 (near ENMAP/XU9) and BMAPADD14 (near U31). BEWARE: now BMAPADD14 has HUGE fanout. Operation difficult, vias are clode by sodler mask (?), so have to surface solder wire just on top ... looks like it can stay... It appear to work now. BMAPADD14 rise time is about 5 nsec on last RAM of AMAPDD14 chain (U43 = AM3 l) farthest away point from source. Now HB #9 passes hb_hlm_test.csh and 10 iteration of AM_MAP random r/w with hbft. Ran MHB_TEST_RANDOM on HB #9. Merger 0/1 are #06/14. Merger #14 gave some error in reading Aspy when used as #0, so moved to #1. At 17:15 4K iter OK. Stop/restart to replace Merger#14 with #17 as Mrg 0. Now Mergers 0/1 are #17/06. Stop at 6K iter OK to measure risetimes with scope. Risetimes on farthest (rightmost) AM_MAP chip, measere 4 times: 1. BMAPADD14 (= AMAPADD14) on pin 20 of U37 2. AMAPADD14 on pin 30 of U43 3. BMAPADD4 on pin 19 of U37 4. AMAPADD5 on pin 1 of U43 All are very similar, rise to 3V in about 6nsec, full swing 0-4V in 10ns. expect 1. ~= 2. and is confirmed (not obvious though). expect from faster to slower: 3 - 4 - 1/2 find instead 4 is fastest, then 1/2, then 3 is the slowest. Anyhow difference is always fraction of nsec. Probably rise time is dominated by RC termination. 18:05 restart MHB_TEST_RANDOM on HB #9. Merger 0/1 are #17/06. July 12 11:40 stop HB #9 random test, overall 200K iter OK after all fixes. End HB tests. HB production is over. Now ship to Fermilab.summary of passed tests for each HB:
# | hbft | hb_hlm_test | mhb_test_random | ship to FNAL |
1 | todo | todo | todo | there already, must recall |
2 | OK | OK | 300K | never |
3 | OK | OK | uncounted | July xx 00 |
4 | OK | OK | uncounted | July xx 00 |
5 | OK | OK | 100K | July xx 00 |
6 | OK | OK | 100K | July xx 00 |
7 | OK | OK | 300K | July 14 00 AZ to PJW |
8 | OK | OK | 300K | July 14 00 AZ to PJW |
9 | OK | OK | 200K | July 17 00 DHL to Franco |
10 | OK | OK | 250K | July xx 00 |
11 | OK | OK | 200K | July xx 00 |
12 | OK | OK | 200K | July xx 00 |
13 | OK | OK | 260K | July xx 00 |
14 | OK | OK | 370K | July xx 00 |
15 | OK | OK | 750K | June 20 00 DHL to Bill |
16 | OK | OK | 750K | July xx 00 |
17 | OK | OK | 360K | not for now |
18 | OK | OK | 360K | not for now |