HP Forums
50g: copy all stored objects to SD card individually - Printable Version

+- HP Forums (http://www.hpmuseum.org/forum)
+-- Forum: HP Calculators (and very old HP Computers) (/forum-3.html)
+--- Forum: General Forum (/forum-4.html)
+--- Thread: 50g: copy all stored objects to SD card individually (/thread-7033.html)

Pages: 1 2


50g: copy all stored objects to SD card individually - DavidM - 10-13-2016 07:55 PM

A user on the HP Support Forums (andy11) posted a question not long ago about
merging the contents of several 50g systems that he owns. The inevitable
discussion ensued, with the final suggestion being to simply copy the needed
objects to his SD card, then merging as needed for redistribution to the various
50g units.

Most users know that you can easily ARCHIVE the 50g HOME contents to the SD
card, but that puts everything into a single file on the card. Automating the
copying of individual calc-based objects to the SD card seemed like a useful
feature to me, so I decided to give it a shot. I've managed to put together a
working prototype of a library that copies all non-hidden objects in HOME
(including subdirectories) to the SD card, placing each object in corresponding
SD directories to their locations on the calculator.

I knew there'd be some file system and character set limitations that I'd run
into, and I was able to mitigate the more obvious ones with reasonable success
-- at least for my particular setup. The 50g SD card implementation seems to be
OK with some extended features. I'm wondering how these accommodations will
translate to other platforms, though.

Here's some of the issues that I've run into and how I've handled them so far:

Case Sensitivity

The calculator is case-sensitive regarding names, but the SD card file system
isn't. As such, it's possible to have multiple objects on the calculator in the
same directory that would not be allowed on the SD card (eg. "MyObj", "myobj",
"MYobj", etc.). If copying the object to the SD card generates a name conflict,
a suffix is added (starts with "2", then goes up to "20" before giving up).

Character Set

The 50g character set includes many valid symbols for object names that would
not be suitable for the file system of the SD card. I've used a simplistic
approach for this; all characters including A-Z, a-z, 0-9, and "." are left
intact. Any other characters in an object name get converted to "$xx", where
"xx" is the hex code for the character.

Object Name Length

This one surprised me. I originally thought that the 50g SD card file system
wouldn't accept long-ish file names, but it seems to be OK with them. At least
up to my test case of 27 characters. I haven't tried longer ones yet.

At present, both my 50g and my Win7x64 system are able to see and access each
stored object on the SD card in this manner. I'm wondering, though, if there are
more strict requirements on other platforms that might access the SD card
contents. Are long/mixed-case file names acceptable for 50g SD cards on
Mac/Linux/other systems? Are there other problems not handled by the above
methods?

The library is a work-in-progress, so I'd rather not post it just yet. But if
you're interested in trying it out, send me a PM with your email address and
I'll send it along. In the meantime, I'd appreciate any thoughts you may have
regarding pitfalls or better approaches to the above.

- David


RE: 50g: copy all stored objects to SD card individually - matthiaspaul - 10-14-2016 09:22 AM

(10-13-2016 07:55 PM)DavidM Wrote:  Case Sensitivity

The calculator is case-sensitive regarding names, but the SD card file system
isn't. As such, it's possible to have multiple objects on the calculator in the
same directory that would not be allowed on the SD card (eg. "MyObj", "myobj",
"MYobj", etc.). If copying the object to the SD card generates a name conflict,
a suffix is added (starts with "2", then goes up to "20" before giving up).
There isn't much you can do about that within the limits of the original file system, but this approach will create problems when you want to restore the contents on the calculator later on - how should the calculator decide if the number was a regular part of the name or a suffix? Possibly, a more refined solution would be to copy the approach used in newRPL - it also uses tilde suffixes for the SFNs (short file names), but adds semicoli to the LFNs (long file names). See:

http://www.hpmuseum.org/forum/thread-4645-post-59021.html#pid59021
https://sourceforge.net/p/newrpl/sources/ci/master/tree/firmware/include/fsystem.h

Quote:Character Set

The 50g character set includes many valid symbols for object names that would
not be suitable for the file system of the SD card. I've used a simplistic
approach for this; all characters including A-Z, a-z, 0-9, and "." are left
intact. Any other characters in an object name get converted to "$xx", where
"xx" is the hex code for the character.
Since you never mentioned the name of the file system, the calculator uses the industry standard FAT file system (or actually FAT12, FAT16 or FAT32, depending on the size of the medium). This is not specific to SD cards in any way. The calculator also supports VFAT LFNs on any of the underlying FAT12, FAT16 or FAT32 filesystems.

The FAT file system itself is character set agnostic, you can store file names in any OEM character set you want for as long as the operating environment (and the user) is able to cope with the possibly strange looking characters.

However, for maximum compatibility with environments lacking the resource files to support the RPL character set, the best approach is to translate the SFNs into codepage 437 (and optionally into codepage 850, 858 or 819 and perhaps a custom 1:1 "pass-through" codepage) - and replace untranslatable characters by "_". For untranslatable characters, this is a one-way process, of course, but this does not cause problems for as long as the VFAT LFNs are used as well - they can be utilized to recover the original characters.

The VFAT long file names should be converted to Unicode. See https://en.wikipedia.org/wiki/RPL_character_set#Code_page_layout for a suggested translation table.

Quote:Object Name Length

This one surprised me. I originally thought that the 50g SD card file system
wouldn't accept long-ish file names, but it seems to be OK with them. At least
up to my test case of 27 characters. I haven't tried longer ones yet.
SFNs are always limited to 8+3 characters in FAT, whereas VFAT LFNs can be officially up to 255 characters long (the design of the filesystem actually supports up to 403 characters, but not all platforms support this). In fact, on some platforms, VFAT LFNs are (artificially) limited to the maximum absolute path length for SFNs due to restrictions on internal buffer sizes - on some platforms this can down to 66 characters.

Greetings,

Matthias


RE: 50g: copy all stored objects to SD card individually - DavidM - 10-14-2016 07:30 PM

(10-14-2016 09:22 AM)matthiaspaul Wrote:  There isn't much you can do about [case sensitivity] within the limits of the original file system, but this approach will create problems when you want to restore the contents on the calculator later on - how should the calculator decide if the number was a regular part of the name or a suffix? Possibly, a more refined solution would be to copy the approach used in newRPL - it also uses tilde suffixes for the SFNs (short file names), but adds dots to the LFNs (long file names).

My first implementation included storing a log file which shows the "before and after" object names, which could be used (and possibly automated) to match the new names to the old where clarification is needed. The initial focus for this wasn't really in the area of backups so much as copying the discrete objects to an external medium. I haven't really thought much about automating the reverse process, as I have mostly thought of this as a one-way function. For purposes of backing up, my personal preference is to use a completely different approach anyway. I suppose that's why I didn't see this in that context.

Also, I should probably note that I'm using SysRPL for this. I'm not aware of any support it provides for manipulating the SFNs and LFNs independently when copying. If anyone knows how to achieve this with SysRPL, I'd be interested in knowing it. This may be possible while using HPGCC, but that's not an option I'm interested in pursuing at this point.

(10-14-2016 09:22 AM)matthiaspaul Wrote:  The FAT file system itself is character set agnostic, you can store file names in any OEM character set you want for as long as the operating environment (and the user) is able to cope with the possibly strange looking characters.

And therein lies one of the problems I ran into, and it's why I implemented the translation to the "$xx" sequences in the first place. I originally copied the objects over with the ID names passed through without translation, but quickly discovered that my Win7x64 computer steadfastly refused to delete any of the objects that had "special" characters in the names. I didn't take the time to track down exactly which characters caused the problem, but I know that the problem went away when I started replacing the characters with the hex codes. So at least for my specific set of systems, leaving those characters untranslated is not a viable option. I don't have the ability to see if the same issue crops up on other platforms that access the SD card, so I'm not sure if this is only a Win7 issue.

I was planning to create a "renaming" function in the library that would reverse the character translation process for calculator-based objects, as I anticipated that it would be fairly simple to do. That wouldn't include the suffix handling discussed earlier, though. There'd still be a need to deal with that in some reasonable way.

(10-14-2016 09:22 AM)matthiaspaul Wrote:  The VFAT long file names should be converted to Unicode. See https://en.wikipedia.org/wiki/RPL_character_set#Code_page_layout for a suggested translation table.

I'd be surprised if this is doable via SysRPL. I simply haven't seen any support for lower-level access to the SD card functions. My gut tells me it's probably built into the ARM side of the house instead of the Saturn-accessible code, but I've learned the hard way not to rule too many things out on these systems. :-)

Thanks for responding! You've provided much food for thought, including whether there's much merit to pursuing this at all given the tools I'm limited to. I still feel like the concept has value, especially when considering the original poster's situation (needing to combine the contents of multiple units). But the effort to complete it may not match the utility gained. There are still other ways to do it that aren't significantly less convenient.


RE: 50g: copy all stored objects to SD card individually - 3298 - 10-15-2016 09:57 AM

This may be look like a weird idea, but I'd approach this problem by simply not using the object's name directly as an SD file name. I've worked with version control systems often enough to know they usually don't do it either, so you could just use a VCS-inspired approach:
- The SD file name is built from the object's name via a hash function. You get a constant-length name (exact length depends on the function, of course) with a limited character set, usually 0-9 and a-f, so no more problems with too long names or illegal characters.
- Inside the file you store not just the object, but a list containing the object and its original name so you can restore it properly. Inside the file you don't have any char set issues, after all.

If you are worried about getting the same hash on different names, implement a collision handler.
You could use something like the ~1, ~2, ~3 suffixes on the SD file name to distinguish files with the same hash. Alternatively using the hash as a directory name containing files with that hash should work; inside the directory naming the files 1, 2, 3 etc should be enough. Then storing and recalling would need to loop through all files corresponding to the hash until the name that's stored inside matches. If there's no match, storing creates a new file, and recalling errors out.

For starters Java's string hashing function should work fine: start with 0, then for each character: multiply with 31, add character value, repeat.

The best part: In theory UserRPL can do it. It might be slow though.


RE: 50g: copy all stored objects to SD card individually - matthiaspaul - 10-15-2016 12:01 PM

(10-14-2016 07:30 PM)DavidM Wrote:  Also, I should probably note that I'm using SysRPL for this. I'm not aware of any support it provides for manipulating the SFNs and LFNs independently when copying. If anyone knows how to achieve this with SysRPL, I'd be interested in knowing it.
I would be interested in this as well.
Quote:I originally copied the objects over with the ID names passed through without translation, but quickly discovered that my Win7x64 computer steadfastly refused to delete any of the objects that had "special" characters in the names. I didn't take the time to track down exactly which characters caused the problem, but I know that the problem went away when I started replacing the characters with the hex codes.
I haven't tried this myself yet. Did Windows (Explorer or CMD?) at least display those special characters correctly, or did it display some junk instead?

If they were displayed correctly, the calculator must have translated the LFNs to UCS-2 correctly (as there is sometimes more than one suitable character in Unicode, it would be interesting to learn about the exact translation vector used by the calculator). In this case, the problem might not be related to the LFNs, but to possibly invalid characters used by the calculator in the SFNs (not normally displayed by Explorer, but displayed with DIR /X under CMD).

If the LFNs were displayed as garbage, the calculator probably just passed the file name through untranslated (which would be okay for the SFNs, but obviously not for LFNs).

Yet another approach used by some embedded systems to save the memory for a translation table is to just replace characters above 127 by some dummy character like "_". However, in this case, Windows should have no problems to handle the files.

Greetings,

Matthias


RE: 50g: copy all stored objects to SD card individually - DavidM - 10-15-2016 02:13 PM

(10-15-2016 09:57 AM)3298 Wrote:  This may be look like a weird idea, but I'd approach this problem by simply not using the object's name directly as an SD file name...

Not weird at all. I had thought about doing something like that when I first started thinking about the case-sensitivity issue, and realized that there would be some other benefits to separating the object from meta data pertaining to it (name, crc, timestamp, platform, src path, calc serial num, etc.). I eventually decided that it would have great utility for a backup situation, but probably wouldn't help as much for the OP's need. At least not without a complete VCS-like app on the receiving platform that knew how to sort out all the details.

It's still a good solution, though, perhaps for a different problem. And your suggestion of using a hash for the ID names is much better than my original thought of using sequential IDs of some kind.


RE: 50g: copy all stored objects to SD card individually - DavidM - 10-15-2016 04:17 PM

(10-15-2016 12:01 PM)matthiaspaul Wrote:  Did Windows (Explorer or CMD?) at least display those special characters correctly, or did it display some junk instead? ...not normally displayed by Explorer, but displayed with DIR /X under CMD

I was using Explorer at the time. I'm not sure which code page was used for the character mapping, but 437 would be one possibility. The 50g "→" character was translated to "ì", which would fit that scenario. I seem to use that character in a lot of object names.

As an experiment, I just created an object on the SD card (from the 50g) named "A→B" using the following steps:

Code:
12345
:3:"A→B"
STO

The 50g was happy to do so, and the Filer shows the object at the root level of the card as expected (with the → shown correctly, of course).

Removing the card and putting it into my Win7 machine is where things get more interesting. Viewing the contents of the card with Explorer shows a file named "AìB", but the file can neither be read nor deleted. Windows reports that the file can't be found when those operations are attempted.

Executing "dir /x" from a command shell lists the file as having no SFN at all, and "AŹB" as the LFN (it would appear that the command shell uses a different codepage for translation than Explorer). The command shell couldn't find the file for deletion or copying, though. I tried del, copy and xcopy, all of which failed.

Then, the plot thickens. I tried another experiment, this time using the ID "AAAAA→BBBBB". The same codepage translations resulted on my Win7 system, but now there was also a SFN listed for the file. Deletion and copying on the Win7 system worked as expected for that file.

Further experimentation seems to show the following, but my limited sample isn't enough to assert these as general conditions -- just specifics that matched my test.

When creating files for the SD card FAT file system, the 50g:
- always created a LFN
- created a SFN when the LFN was longer than the 8.3 standard
- created a SFN when the LFN contains characters with mixed case

If the file on the SD card had no SFN AND it also contained a "→" character, the file could not be read or deleted by my Win7 system. I will attempt to try some other "high ascii" characters to see if those also cause problems.

Objects copied by my prototype were all OK because the "problem" characters were translated to "acceptable" ones for Win7 to do what it needed when no SFN existed.

(10-15-2016 12:01 PM)matthiaspaul Wrote:  If the LFNs were displayed as garbage, the calculator probably just passed the file name through untranslated (which would be okay for the SFNs, but obviously not for LFNs).

This has been my assumption all along. I believe the support for FAT that's built-in to the 49g+/50g is minimal (at best).

(10-15-2016 12:01 PM)matthiaspaul Wrote:  Yet another approach used by some embedded systems to save the memory for a translation table is to just replace characters above 127 by some dummy character like "_". However, in this case, Windows should have no problems to handle the files.

That was the same character I initially chose when I was first experimenting with this. Smile I quickly found, though, that doing this kind of simple swap-out caused an increase in the number of ID collisions. So I opted for the more specific hex code translation.


RE: 50g: copy all stored objects to SD card individually - Claudio L. - 10-15-2016 07:28 PM

(10-15-2016 04:17 PM)DavidM Wrote:  I believe the support for FAT that's built-in to the 49g+/50g is minimal (at best).

It actually is very minimal, which prompted a completely new file system implementation for hpgcc back in the day.
Tim wrote the SDFiler in sysRPL, on top of SDLIB (included in SDFiler, never released separately but doesn't need the filer to work). The SDLIB library provides commands you can use from userRPL/sysRPL. You mentioned you didn't want to use hpgcc, which I understand, but this "canned" solution might fit your requirements, as your code would be 100% sysRPL.
However, for your case I think I would go with the idea of replacing the names with some heavily sanitized version of the name. Then you include the name back inside the file for proper recovery (a list { name value } as mentioned in other posts sounds very appropriate). The file name doesn't have to be a hash, it can be readable text but heavily sanitized. This is only for a human to find his files quickly, the restore operation should only use the original name inside.
You could still compute a hash and append it to the name to reduce collisions of the sanitized names.


RE: 50g: copy all stored objects to SD card individually - matthiaspaul - 10-15-2016 09:46 PM

(10-15-2016 04:17 PM)DavidM Wrote:  As an experiment, I just created an object on the SD card (from the 50g) named "A→B" using the following steps:

Code:
12345
:3:"A→B"
STO
[...]
Viewing the contents of the card with Explorer shows a file named "AìB", but the file can neither be read nor deleted. Windows reports that the file can't be found when those operations are attempted.
This observation most likely indicates that (at least) this special character wasn't translated at all by the calculator:

"→" has codepoint 8Dh in the RPL character set. The equivalent character has codepoint 1Ah in codepage 437, 850, 858 and many others, but would be difficult to display in many scenarios as it lies in the control character range.
If the calculator would have translated the character to Unicode this would have resulted in character U+2192 in the LFN.
There are two scenarios for Windows to display an "ì": It could be the result of a character U+00EC in a LFN, or of a code mapping to "ì" in a SFN (which happens to be at codepoint 8Dh in codepage 437, 850, 858 etc.).
If a LFN exists, it would certainly have been used by Windows, but there is no reasonable scenario why the calculator should have translated codepoint 8Dh into U+00EC. By this we can deduct that this file has only a SFN.

Quote:Executing "dir /x" from a command shell lists the file as having no SFN at all, and "AŹB" as the LFN (it would appear that the command shell uses a different codepage for translation than Explorer). The command shell couldn't find the file for deletion or copying, though. I tried del, copy and xcopy, all of which failed.
Yes and no.

Yes: Explorer obviously assumes a different codepage for translation than CMD. What codepage is displayed by CHCP under CMD?

No: While it is possible to disable SFNs in NTFS, this is impossible for FAT. Even though Microsoft calls the FAT SFNs "alias names", the SFNs are the *actual* file names, and the LFNs rather than the SFNs are optional. A LFN can be seen as some kind of "extended attribute" loosely attached to the directory entry holding the SFN (and all the other information about the file). A SFN may contain invalid characters and be distorted to the point of not resembling the corresponding LFN any more at all, but it is impossible for a SFN not to exist in the FAT filesystem.

So, basically DIR /X is displaying the SFN as if it would be a LFN if no LFN exists. The file name was short enough to still fit into the 8.3 naming scheme, so it seems the calculator hasn't created a LFN even though the name contains special characters (and this is normally another reason to create a LFN).

Quote:Then, the plot thickens. I tried another experiment, this time using the ID "AAAAA→BBBBB". The same codepage translations resulted on my Win7 system, but now there was also a SFN listed for the file. Deletion and copying on the Win7 system worked as expected for that file.
In this case the name was too long to still fit into the 8.3 scheme, so the calculator must have created a LFN.

Was this file displayed as "AAAAA→BBBBB" or "AAAAAìBBBBB" in Explorer?

And how was this file displayed by DIR /X exactly? "AAAAAŹBBBBB" and "AAAAAŹ~1" or something different?

Quote:Further experimentation seems to show the following, but my limited sample isn't enough to assert these as general conditions -- just specifics that matched my test.

When creating files for the SD card FAT file system, the 50g:
- always created a LFN
- created a SFN when the LFN was longer than the 8.3 standard
- created a SFN when the LFN contains characters with mixed case
- always created a SFN
- created a LFN when the name was longer than the 8.3 standard
...
Quote:If the file on the SD card had no SFN AND it also contained a "→" character, the file could not be read or deleted by my Win7 system. I will attempt to try some other "high ascii" characters to see if those also cause problems.
That will be very interesting.

Once we know what rules are used by the calculator, we might be able to derive an "adaptive partial renaming scheme" for your library.

For example, if we knew for sure, that the calculator does not apply any translation at all and that it creates LFNs only for names not fitting into the 8.3 scheme, your library could apply different pre-translations depending on the length of the name:

For names long enough to end up as LFNs on disk, the library could pass through characters 00h..7Fh and (A0h) A1h..FFh and only translate characters 80h..9Fh (and possibly A0h) from the RPL character set using your $xx translation scheme (or another scheme, if we could find something better). Unfortunately, all RPL codepoints 80h..A0h map to Unicode codepoints larger than 00FFh, so, without any translation by the calculator and no low-level filesystem access, it is impossible to enforce better suitable characters on disk.

For names short enough to end up as SFNs, the library could either apply a translation to codepage 437, 850 etc. and remove characters which are not allowed in SFNs. Alternatively, the library could add some dummy characters to ensure that the calculator will also create a LFN for the file. Perhaps the ";" could be used for this purpose as well (for as long as its not conflictive with newRPL - to be sorted out)...

Greetings,

Matthias


RE: 50g: copy all stored objects to SD card individually - DavidM - 10-15-2016 11:43 PM

(10-15-2016 07:28 PM)Claudio L. Wrote:  ...SDLIB (included in SDFiler, never released separately but doesn't need the filer to work). The SDLIB library provides commands you can use from userRPL/sysRPL.

One of the reasons I started this thread in the first place is that I knew if there were any responses, they would bring new ideas into the mix. I was already vaguely familiar with SDFiler, but to be honest never went through the documentation enough to realize how SDLIB was at the heart of it.

One of the reasons I had thought of this app as a "one-way street" is that I knew there was no straightforward way of getting a directory listing from the SD card (short of using SDLIB Smile ). Lacking that, retrieving arbitrary lists of files from the SD card would be quite messy, and would have to be built on top of some very restrictive assumptions.

I had originally thought that copying the files back to the calculator(s) would be done using Conn4x. That said, I realize that some people can't use (or prefer not to use) that app, and copying the files back from the SD card in their new hierarchy would be very convenient. That would certainly improve the usefulness of the app, and could open up other possibilities as well (yes, using it for backups is the first thing that comes to mind).

As an example, I could see a scenario like the following:

Using your computer, arrange your favorite calculator objects into a folder structure on the SD card that might look something like this:

Code:

<root>
    SOMEFLDR
        HOME
            Fldr1
                Obj1
                Obj2
            Fldr2
            ...
        PORT0
            PortObj1
            PortObj2
        PORT1
            PortObj1
            PortLib1
        PORT2
            PortObj1
            PortLib1
            ...

An app could then be utilized to recreate that same structure on the calculator. Different options might include things like wiping out current calculator contents before copying, default actions for when a name conflict occurs (ignore, overwrite, rename), disposition of hidden directory items, and I'm sure a lot of things that would come to mind later.

(10-15-2016 07:28 PM)Claudio L. Wrote:  However, for your case I think I would go with the idea of replacing the names with some heavily sanitized version of the name. Then you include the name back inside the file for proper recovery (a list { name value } as mentioned in other posts sounds very appropriate).

I like the concept of the list for each object, and I would include one more item: a small sentinel value of some kind to identify it as a pre-formatted "object list" appropriate for this app. That would make it easy to determine if the object should be renamed or simply copied over intact with whatever name it already had (possibly still with some translation of ID characters if appropriate).

More things to ponder!


RE: 50g: copy all stored objects to SD card individually - DavidM - 10-16-2016 02:09 AM

(10-15-2016 09:46 PM)matthiaspaul Wrote:  What codepage is displayed by CHCP under CMD?
437

(10-15-2016 09:46 PM)matthiaspaul Wrote:  No: While it is possible to disable SFNs in NTFS, this is impossible for FAT. Even though Microsoft calls the FAT SFNs "alias names", the SFNs are the *actual* file names, and the LFNs rather than the SFNs are optional. A LFN can be seen as some kind of "extended attribute" loosely attached to the directory entry holding the SFN (and all the other information about the file). A SFN may contain invalid characters and be distorted to the point of not resembling the corresponding LFN any more at all, but it is impossible for a SFN not to exist in the FAT filesystem.

I suppose Microsoft needs to change their help output for "dir", then. Smile It states the following for the /X switch:
Code:
  /X          This displays the short names generated for non-8dot3 file
              names.  The format is that of /N with the short name inserted
              before the long name. If no short name is present, blanks are
              displayed in its place.


(10-15-2016 09:46 PM)matthiaspaul Wrote:  Was this file displayed as "AAAAA?BBBBB" or "AAAAA탂BBB" in Explorer?

I took another look at it. Explorer shows it as "AAAAABBBBB" (no visible character between the "AAAAA" and "BBBBB"). See below for how it shows with "dir /x".

(10-15-2016 09:46 PM)matthiaspaul Wrote:  And how was this file displayed by DIR /X exactly? "AAAAAZBBBBB" and "AAAAAZ~1" or something different?
Code:
10/15/2016  07:52 PM                19 AAAAAì~1     AAAAA?BBBBB

(10-15-2016 09:46 PM)matthiaspaul Wrote:  
Quote:If the file on the SD card had no SFN AND it also contained a "?" character, the file could not be read or deleted by my Win7 system. I will attempt to try some other "high ascii" characters to see if those also cause problems.
That will be very interesting.

Just out of curiosity, I decided to try the "bad character test" using a different SD card. All attempted character tests were successful using that card, including the x8D test that failed before. This led me to believe that I had some type of file system corruption on the original SD card, so I reformatted it and tested again. Now all filename tests are working properly with Windows on that card as well. So it's apparent that some type of file system corruption was causing the original problem. I'll be on the lookout for more strange issues like that, and I'll retest if I start seeing other problems. So my concerns about inaccessible files are no longer a result of embedded characters in the names. Just lurking file system corruption!

(10-15-2016 09:46 PM)matthiaspaul Wrote:  Once we know what rules are used by the calculator, we might be able to derive an "adaptive partial renaming scheme" for your library.

As I am now also pondering the possibilities in moving files back from the card to the calculator, I'm actually more inclined to keep this as simple and straightforward as possible. I've definitely got to do some testing of file IDs written to the card by the computer and seeing how the calculator interprets them. As an example, I'm curious what will happen if the computer writes a LFN using Unicode (if that's even possible on a FAT-formatted card), and how the calculator will see that object's name using SDLIB.


RE: 50g: copy all stored objects to SD card individually - Claudio L. - 10-16-2016 10:33 AM

(10-16-2016 02:09 AM)DavidM Wrote:  As an example, I'm curious what will happen if the computer writes a LFN using Unicode (if that's even possible on a FAT-formatted card), and how the calculator will see that object's name using SDLIB.

If my memory serves me well, SDLIB is hard wired to use CP 850 for translation. Any Unicode characters outside range would be replaced with '_', but I can't be 100% sure since there were several versions of that file system (for hpgcc and later hpgcc3) and can't recall exactly which one ended up compiled into SDLIB.
The last revision made it into newRPL, and this time it supports Unicode, so no more code pages. If there's any special characters, it always creates a LFN, which is Unicode by nature. Reading is also not a problem because the entire calculator OS supports Unicode text in UTF-8, so no translation is needed.


RE: 50g: copy all stored objects to SD card individually - Claudio L. - 10-16-2016 10:37 AM

(10-16-2016 02:09 AM)DavidM Wrote:  Just out of curiosity, I decided to try the "bad character test" using a different SD card.

I can't recall if it was fixed in latest ROMs, but I remember the worst offending case of illegal characters was lowercase letters.

:3:"lcase" STO

used to produce a SFN only (because the name fits in 8.3) but without changing the name to uppercase. Lowercase letters are illegal characters in SFN, those files couldn't be touched with a PC but could be deleted normally with the calc.


RE: 50g: copy all stored objects to SD card individually - matthiaspaul - 10-16-2016 10:55 AM

(10-16-2016 02:09 AM)DavidM Wrote:  
(10-15-2016 09:46 PM)matthiaspaul Wrote:  What codepage is displayed by CHCP under CMD?
437
That's strange given that "Ź" isn't a character defined in codepage 437. I was assuming something like codepage 852 or 1250, even though this would not be a common default setting for a US-based system (437 is).

Well, CHCP displays the system codepage (of the subsystem), not necessarily the codepage of the display (or actually the console window). Does

MODE con:

report a different codepage under CMD?

Does the "Ź" character change to something different, if CMD is switched from full-screen mode to windowed mode? Background: Due to limitations in the character repertoire in the display font in windowed mode, it is possible that Windows does not show some characters correctly in windowed mode (it depends on the size of the console window and which font and character size is selected in the console window Properties>Font), whereas it always does in full-screen mode, because the codepage reported by MODE con: is the codepage of the font uploaded into the display adapter in full-screen mode, and it is impossible to switch the con: device to codepages which aren't supported in full-screen mode (I don't know if this still holds true for Windows 7, but in older versions, these text mode display fonts were stored in .CPI files). However, with more complete support in newer Windows versions, I thought this would be something of the past...

Quote:
(10-15-2016 09:46 PM)matthiaspaul Wrote:  No: While it is possible to disable SFNs in NTFS, this is impossible for FAT. Even though Microsoft calls the FAT SFNs "alias names", the SFNs are the *actual* file names, and the LFNs rather than the SFNs are optional. A LFN can be seen as some kind of "extended attribute" loosely attached to the directory entry holding the SFN (and all the other information about the file). A SFN may contain invalid characters and be distorted to the point of not resembling the corresponding LFN any more at all, but it is impossible for a SFN not to exist in the FAT filesystem.

I suppose Microsoft needs to change their help output for "dir", then. Smile It states the following for the /X switch:
Code:
  /X          This displays the short names generated for non-8dot3 file
              names.  The format is that of /N with the short name inserted
              before the long name. If no short name is present, blanks are
              displayed in its place.
Yes, this help text is highly misleading. However, as we can see from the first sentence, Microsoft's description seems to be written from the perspective of the API, not from the perspective of the filesystem: On API level, the normal file names users deal with (today) are the long file names, and by default the system automatically derives some 8+3 short file names from these names for backward compatibility. AFAIK, this can be switched off for NTFS. However, the generation of SFNs cannot be suppressed for FAT because the data structures of the FAT filesystem fundamentally rely on these entries - the SFNs are part of the (fixed-size) 32-byte directory entries which hold all the information about a file (except for the LFN). At most, you could store blanks or invalid characters in their place, but then you'd end up with invalid SFNs, not "no SFNs". IIRC, the system will report the SFN through the LFN API if a file has no LFN.

BTW. From the DIR /X help in 4DOS: "The short filename is left blank if the short name and the long name are the same."

Quote:
(10-15-2016 09:46 PM)matthiaspaul Wrote:  And how was this file displayed by DIR /X exactly? "AAAAAZBBBBB" and "AAAAAZ~1" or something different?
Code:
10/15/2016  07:52 PM                19 AAAAAì~1     AAAAA?BBBBB
"Z" and "?" rather than "Ź"? It seems, something went wrong here in the transition of this screen copy.

Quote:Just out of curiosity, I decided to try the "bad character test" using a different SD card. All attempted character tests were successful using that card, including the x8D test that failed before. This led me to believe that I had some type of file system corruption on the original SD card, so I reformatted it and tested again. Now all filename tests are working properly with Windows on that card as well. So it's apparent that some type of file system corruption was causing the original problem.
[...]
I'll be on the lookout for more strange issues like that, and I'll retest if I start seeing other problems. So my concerns about inaccessible files are no longer a result of embedded characters in the names. Just lurking file system corruption!
Given the unusually high file system corruption rate (http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/archv017.cgi?read=126574), I suspect that the calculator is causing this corruption in the first place. If this happens again, inspecting the directory entry of an "undeletable" file with a disk editor might reveal the cause of the problem. Perhaps CHKDSK or SCANDISK would give some useful error message as well.
Quote:As an example, I'm curious what will happen if the computer writes a LFN using Unicode (if that's even possible on a FAT-formatted card).
Of course, that's possible. In the FAT filesystem LFNs are always stored in the UCS-2 format, so (putting the buggy implementation in the calculator aside) LFNs are - by definition - always expected to be given in Unicode with the sole restriction that only 16-bit Unicode characters are supported. (On higher levels, filenames may be given in the more convenient UTF-8 format, but this is just another storage format, not a different character set.)

Greetings,

Matthias


RE: 50g: copy all stored objects to SD card individually - matthiaspaul - 10-16-2016 04:57 PM

(10-16-2016 10:37 AM)Claudio L. Wrote:  I can't recall if it was fixed in latest ROMs, but I remember the worst offending case of illegal characters was lowercase letters.

:3:"lcase" STO

used to produce a SFN only (because the name fits in 8.3) but without changing the name to uppercase. Lowercase letters are illegal characters in SFN, those files couldn't be touched with a PC but could be deleted normally with the calc.
That's correct. The underlying technical reason why most operating systems won't be able to access FAT files stored with lowercase SFNs is because they will upcase the filename provided by the user and compare it with the filename stored on disk while assuming that it is stored in upper case already - consequently, SFNs stored in lower case will never match and thus the files are never found.

In many systems the upcase vector for codepoints 00h..7Fh is hardwired assuming an ASCII character set, so characters 61h..7Ah are just translated into 41h..5Ah. However, for codepoints 80h..FFh the upcase translation can differ significantly depending on the active codepage, country settings, and operating system. For example, depending on context codepoint 8Dh ("ì" in codepage 437 and 850) would be translated to either "Ì" or "I" or not be translated at all.

This is also the reason why properly upcasing the on-disk filename later on would require knowledge about the active codepage and country setting at the time the file was originally stored. Only codepoints 61h..7Ah could be reliably upcased later on.

Knowing that the 49g+/50g firmware can or could create such invalid SFNs, perhaps newRPL should be enhanced by adding a little special case which would upcase 61h..7Ah in on-disk SFNs before filename comparisons in order to improve compatibility at least for direct file exchange. This would consume only a couple of bytes and not be conflictive with normal file access as these characters cannot normally occur in SFNs.

I don't have a free SD card handy right now to perform some tests myself, but wouldn't storing

:3:"lcase;;;;;;;" STO

on a 49g+/50g with HP firmware also work around the problem in a way compatible with newRPL and operating systems supporting LFNs?

Greetings,

Matthias

EDIT: See also: http://www.hpmuseum.org/forum/thread-4645-post-62759.html#pid62759


RE: 50g: copy all stored objects to SD card individually - DavidM - 10-16-2016 05:21 PM

(10-16-2016 10:37 AM)Claudio L. Wrote:  
(10-16-2016 02:09 AM)DavidM Wrote:  Just out of curiosity, I decided to try the "bad character test" using a different SD card.

I can't recall if it was fixed in latest ROMs, but I remember the worst offending case of illegal characters was lowercase letters.

:3:"lcase" STO

used to produce a SFN only (because the name fits in 8.3) but without changing the name to uppercase. Lowercase letters are illegal characters in SFN, those files couldn't be touched with a PC but could be deleted normally with the calc.

Just checked (on a freshly formatted SD card Smile ). With ROM 2.15, created a file as described above, then transferred the card to my Win7 system. DIR/X shows a SFN of "LCASE", and I have no problem accessing the file. So it appears that has been fixed in the current ROM version.

But what you're describing seems to be a really good fit for the results I was seeing when I was experiencing problems with my "corrupted" file system on the card previously (with a couple wrinkles). It's almost as though the 50g had mucked something up on the card that was causing it to go back to its bad behavior.


RE: 50g: copy all stored objects to SD card individually - DavidM - 10-16-2016 05:24 PM

(10-16-2016 04:57 PM)matthiaspaul Wrote:  I don't have a free SD card handy right now to perform some tests myself, but wouldn't storing

:3:"lcase;;;;;;;" STO

on a 49g+/50g with HP firmware also work around the problem in a way compatible with newRPL and operating systems supporting LFNs?

Just tried this, and both versions (with and without the semis) worked fine with Win7. See my previous post. This issue may have been fixed by v2.15 of the firmware.


RE: 50g: copy all stored objects to SD card individually - DavidM - 10-16-2016 05:30 PM

(10-16-2016 05:24 PM)DavidM Wrote:  Just tried this, ...

One more thing I just noticed when looking at DIR/X results:
Code:
10/16/2016  01:14 PM                14 LCASE        lcase
10/16/2016  01:14 PM                14 lcase_~1     lcase;;;;;;;

Interesting that the SFN is showing as lower case here, but the file was still accessible and I had no trouble deleting it with Win7. So perhaps the problem is actually still there and Win7 was handling it anyway.


RE: 50g: copy all stored objects to SD card individually - matthiaspaul - 10-16-2016 07:26 PM

(10-16-2016 05:30 PM)DavidM Wrote:  
Code:
10/16/2016  01:14 PM                14 LCASE        lcase
10/16/2016  01:14 PM                14 lcase_~1     lcase;;;;;;;
Interesting that the SFN is showing as lower case here, but the file was still accessible and I had no trouble deleting it with Win7. So perhaps the problem is actually still there and Win7 was handling it anyway.
Very interesting! :-) The first entry is fine, whereas in the second example the SFN is obviously faulty (it should read "LCASE_~1" instead). So, the calculator is still not properly upcasing the SFN when the original filename is too long to still fit into a SFN.

There's something special about the first entry as well, and it might lead us to some bug trigger condition. It looks as if the calculator created a LFN even though this wasn't absolutely required as the filename was short enough to still fit into the 8.3 scheme. So the LFN exists only to preserve the case. This is perfectly fine from the perspective of the filesystem, but the VFAT filesystem defines four optional cases for which it is possible to preserve the case of the filename without creating a separate LFN entry in the filesystem. In any of these four special cases, the filename's case is stored in 2 bits of the same 32-byte directory entry which holds the SFN: One bit defines if the (up to 8 characters long) name part of the filename is all uppercase or all lowercase, the other bit defines if the (up to 3 characters long) extension part of the filename is all uppercase or lowercase. Since the "lcase" example was all lowercase, we don't know, if the calculator actually stored a LFN or just recorded the special case info. Let's see what the calculator makes out of the following:

:3:"Mixcase" STO force LFN entry (unless treated as "mixcase")
:3:"CamelC" STO force LFN entry
:3:"UPCASE" STO SFN is sufficient, no LFN support required at all
:3:"SPECIÄL" STO no reason to create a LFN entry but some system may do
:3:"speziäl" STO no reason to create a LFN entry, but LFN support required
:3:"specÿal" STO no uppercase equivalent for ÿ in codepage 437 or 850
:3:"speci■l" STO special character from critical range 80h..9Fh (■ = \[] = 158)

Greetings,

Matthias


RE: 50g: copy all stored objects to SD card individually - DavidM - 10-16-2016 07:51 PM

(10-16-2016 07:26 PM)matthiaspaul Wrote:  Let's see what the calculator makes out of the following:

:3:"Mixcase" STO force LFN entry (unless treated as "mixcase")
:3:"CamelC" STO force LFN entry
:3:"UPCASE" STO SFN is sufficient, no LFN support required at all
:3:"SPECIÄL" STO no reason to create a LFN entry but some system may do
:3:"speziäl" STO no reason to create a LFN entry, but LFN support required
:3:"specÿal" STO no uppercase equivalent for ÿ in codepage 437 or 850
:3:"speci■l" STO special character from critical range 80h..9Fh (■ = \[] = 158)

Code:
 Directory of I:\

10/16/2016  03:35 PM                19 MIXCASE      Mixcase
10/16/2016  03:36 PM                19 CAMELC       CamelC
10/16/2016  03:36 PM                19              UPCASE
10/16/2016  03:37 PM                19              SPECI─L
10/16/2016  03:38 PM                19 SPEZIΣL      speziäl
10/16/2016  03:39 PM                19 SPEC AL      specÿal
10/16/2016  03:40 PM                19 SPECI₧L      speci?l
               7 File(s)            133 bytes
               0 Dir(s)     251,949,056 bytes free

...and yes, copying the text and pasting into a code block creates some confusion due to mixing code pages again. Here's an image of the output as it actually appears in a command shell window:
[Image: attachment.php?aid=4049]