regex anyone? (php MYSQLR)

PATXPATX 2,820 Posts
edited August 2011 in Strut Central
Hi,
Anyone here got experience with cleaning MySQLs with regex? php or whatever works. I need pointers, including how not to break everything.

  Comments


  • GrandfatherGrandfather 2,303 Posts
    I have some experience with RegEx's in general. But the thing I've learned is that they suck.
    whats the issue dude?

  • PATXPATX 2,820 Posts
    Hmmm, what happened is that I have a MySQL CMS and to make the admin part more idiot proof, I added the CKeditor plugin to the custom admin interface.... so when people edit the records, they can paste straight from Word without having to know the HTML to format it.

    Problem is that Ckeditor installs Scayt (spell check as you type - the red squiggly lines you see in the composer window here on strut for example) and these squiggly lines are in fact a whole lot of bullshit tags that have now been saved to the file. I need them out, but they are not all the same. I have removed Scayt, but the damage is done for the records that have been edited in the meantime. It's a 100+ records in multiple places.

    So I guess I need to export the sql file, run a regex on it to remove these tags, then re-import the sql. This scares me a bit.

    The bit I need to edit looks like this pattern:
    <span data-scayt_word="Luo-speaking" data-scaytid="6">Luo-speaking</span>

    I want to remove, in this case
    <span data-scayt_word="Luo-speaking" data-scaytid="6">

    and
    </span>

  • GrandfatherGrandfather 2,303 Posts
    http://regexpal.com/ will let you test your regex online.

    Im guessing the data-scayt_word="blahblah" and id=blah are the parts that are different?

  • PATXPATX 2,820 Posts
    yep, those will be different every time, but always within the span tags. I guess it is reasonably straightforward but that is a relative statement.

    Thanks for the link. I am still scared of running this on a whole database! Will make many copies first.

  • GrandfatherGrandfather 2,303 Posts
    wait, this is an obvious question but you dont want ANY tags in the DB right? in your records?
    You could just strip any tags with this regex:
    <[^<]+?>


    that will match any tags in a string, processing is up to you

  • PATXPATX 2,820 Posts
    Nah, I need to keep all tags except spans.... so it might be easy then?

    Is this really the sexiest thread on ss today?

  • <span data-scayt[^><]*>|<.span[^><]*>


    This will match the opening span tag with the data_scayt attribute (as long as the attribute is the first attribute) and the matching closing span. This'll f up if you've got unclosed tags or your source isn't somehow well-formed....

  • PATXPATX 2,820 Posts
    Nice, I am takin all this shit to regexpal.com

  • GrandfatherGrandfather 2,303 Posts
    haha, that jehovWet one should work.

  • dark0dark0 61 Posts
    You have to make sure you do a lazy match. If you do a greedy one it will grab everything between the first open tag and the last tag. So if you have something like

    <span>blah</span>
    <div>blah blahb</div>
    <div>blah</div>
    <span>blah</span>


    You will end up matching the entire string instead of matching the first span and it's closing tag.

    Had to edit because I forgot the spans would be removed

  • PATXPATX 2,820 Posts
    Excellent point and I am thinking intern route right now LOL

  • BrianBrian 7,618 Posts
    nerds
Sign In or Register to comment.